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Remarks 

Claims 24, 26-28, 31-35 and 37-40 were rejected under 35 USC §112, first 
paragraph, as failing to comply with the written description requirement. Applicants 
respectfully submit that the recitation of "fungal disease resistance activity when 
expressed in a plant" does not constitute new matter. 

It is stated on page 2, lines 5-19 of the specification that: "One such example 
is the Mlo gene of barley, which conveys resistance to Erysiphe graminis f. sp. 
hordei. The barley gene has been recently isolated by a positional cloning approach 
(Bueschges et al. (1997) Cell 88:695-705, copy enclosed). The dominant (sensitive) 
allele (Mlo) is thought to encode a protein involved in regulation of leaf cell death and 
in the onset of pathogen defense. The partial or complete inactivation of Mlo results 
in the priming of the disease-resistance response even in the absence of the 
pathogen, and leads to increased resistance to E. graminis" 

Accordingly, since the specficiation discloses that the partial or complete 
inactivation of Mlo resulted in disease resistance of the Barley plant to the fungus 
Erysiphe graminis , it is respectfully submitted that this does constitute sufficient 
support for the recitation of "fungal disease resistance activity when expressed in a 
plant " 

Withdrawal of the rejection of the claims under 35 USC paragraph 112, first 
paragraph, is respectfully requested in view of the above discussion 

Claims 24, 26, 31-35 and 37-40 were rejected under 35 USC § 1 12, first 
paragraph, as failing to comply with the written description requirement. 

It is respectfully submitted that the specification discloses to one of ordinary 
skill in the art a representative number of Mlo polypeptides having at least 90% 
sequence identity with the sequence set forth in SEQ ID NO:32, and not just a single 
polynucleotide encoding SEQ ID NO:32. 

The specification, at page 7, line 3 to 32, discloses alterations in nucleotide 
sequence that are not expected to alter functionality, such as alterations that produce 
a chemically equivalent amino acid at a given site or alterations in the N- or C- 
terminal portions. Thus, from the foregoing, the skilled artisan would immediately 
understand the specification to disclose a representative number of polynucleotide 
sequences, having different nucleotide substitutions, that encode Mlo polypeptides 
but that vary (within 90% sequence identity) of SEQ ID NO:32. 

Claims 24, 26-28, 31-35 and 37-40 were rejected under 35 USC §112, first 
paragraph, on the ground that the specification is not enabling for SEQ ID NO:32 and 
sequences recited 90-95% sequence identity. 

Attention is kindly invited to Devoto et al. JBC (1999) 274: 34993-35004 (copy 
previously submitted) and Bueschges et a/., Cell (1997) 88:695-705 (copy 
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enclosed). The publications disclose seven distinctive Mlo transmembrane-spanning 
domains, a nuclear localization signal, which consists of short sequences containing 
mainly basic residues as described in Garcia et ai BBA (1991) 1071: 83-101 ( copy 
enclosed for examiner's convenience) and two copies of the conserved casein 
kinase II motif (S/T-X-X-D/E) in addition to four Cysteine residues strictly conserved 
among all family members, located in extracellular loops 1 and 3 and indicative of 
their involvement in the formation disulfide bridges. 

Enclosed herewith in Appendix A is a comparison of the claimed sequence 
with the barley sequence disclosed by Bueschges. This comparison demonstrates 
the sequence of the invention possesses these distinctive, highly conserved, Mlo 
regions. 

Eleven mutagen induced Mlo resistant alleles were identified by Bueschges. 
The identified mutations were comprised of point mutations or deletions and all 
conferred pathogen resistance. The mutations clustered towards certain areas of the 
protein, which could be indicative of functionally sensitive domains. For the 
Examiner's convenience, all of the sites of mutation are labeled in the barley 
sequence in Appendix A. All the labeled residues are conserved among the barley 
and claimed sequence of SEQ ID NO:32. One skilled in the art would appreciate that 
the more highly conserved a residue is, the less likely that it could be modified and 
function maintained. 

This alignment and the domains illustrate in the attached Appendix, that one 
of ordinary skill in the art could quickly determine which amino acid residues might be 
modified in SEQ ID NO:32 without a likely change in function. Since SEQ ID NOs:32 
and the barley sequence share only 87% identity, one of skill in the art would have 
appreciated that many variants sharing at least 90% sequence identity to the SEQ ID 
NO:32 would have been expected to retain Mlo activity. Indeed, recently full 
complementation between barley and wheat Mlo mutants was achieved, confirming 
the expectation that Mlo from divergent species is functionally similar (Devoto et al. 
JMolEvol (2003) 56:77-88, copy enclosed for examiner's convenience). Thus, the 
known correlation of structure to function combined with the above discussion does 
enable one skilled in the art to make and use the claimed invention as 
commensurate in scope with the claims without undue experimentation. 

It is respectfully submitted that the claims are now in form for allowance which 
allowance is respectfully requested. 
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Summary 

Mutation-induced recessive alleles (mlo) of the barley 
M/o locus confer a ,eaf .esion phenotype and broad 
ertru m resistance to the funga( patno Erysiphe 
graminis f. sp. hordei. The gene has been isolated 
using a positional cloning approach. Analysis of 11 
mutagen-mducedm/o alleles revealed mutations lead- 

wL"L 0336 *° a,terations the deduced Mlo 
wild-type ammo acid sequence. Susceptible intra- 
genic recombinants, isolated from m/o heteroallelic 
H h w ^ reSt ° red M '° wi 'd-type sequences. The 
deduced I 60 kDa protein is predicted to be membrane! 
anchored by at least six membrane-spanning helices. 
The findings are compatible with a dual negative con- 
trol function of the Mlo protein in leaf cell death and 

orim,»VtH Set ° f path ° gen defens *: absence of Mlo 
pnmes the responsiveness for the onset of multiple 

defense functions. 

Introduction 

In plants resistance to specialized pathogens is fre- 
quently ■triggered by a recognition event followed by 
a coordinated complex defense response resulting 

KnlZt . , C ° ntainrTlent ° f me intruder (Hammond 
fesack and Jones, 1 996). in this type of plant-pathogen 
.nteracton, res.stance is specified by and dependent 
the £ o r S T e """P'^ntary genes, one from 

the host and one from the pathogen (Flor, 1971). The 



complementary genes have been termed race-specific 
res stance gene and avirulence gene, respectively Sev! 
eral resistance genes have been isolated and appear to 

fLRR, ^h S IT eithercontei " aleucine-rich region 
tf ^ ° r J"' th 5> ut an attached nucleotide binding 
site (NBS), ind.cat.ve of ligand-binding and protein- 
protem interaction. Anotherclass encodes a simple ser- 
ne/threonme kinase (Dangl, 1995; Staskawicz et al 
1995; Zhou et al., 1 995). The genetic and molecular ob- 
servations are compatible with aspecrficreceptor-medl- 
ated signal response triggering pathogen defense. The 

SSr^nS™™" ^ reSfStanCe 9ene Pr ° dUCtS fr °™ 
bfctlrC I SP ! C,6S t0 diVeree P at ^°9«ns such as 
bactena, fung,, and v.ruses imply the existence of com- 

^thouah W tnfr am " t i0ChemiCal defense mechanisms. 
Although these mechanisms remain to be uncovered 

inferf H 8ath ° f h ° St Ce " S 31 the site ° f attemp?«2 
infection, designated the hypersensitive response (HR) 
accompanies many incompatible race-specific interac- 
tions (Stakman, 191 5; Staskawicz et al., 1995) SimS' 

pathoo n enV n * * e C ° mm ° n ^^Si 

pathog en Erysiphe graminis f. sp. hordei is in most ana- 

ale 6 r SPeCi f ed by dominant or semldominant 
HR ,ZZ reS ' StanCe 9 enes «nd associated with a 
HR {Mix; Jergensen, 1994). 

Monogenic resistance mediated by recessive (m/o) 
alleles of the M/o locus is different. Apart from being 

S ^ fr ° m race " s P ec i^ incompatibility to 

single pathogen strains in that (1) it confers a broadl 
spectrum resistance to almost all known isolates of the! 
fungal pathogen, (2) m/o resistance alleles have been 
obtained by mutagen treatment of any tested suscepti- 

ntZn'% Pe ^' b) Variety ' and < 3 > the resistance is ap- * 
parent* durable ,n the field despite extensive cultivation 
m Europe (Jargensen, 1992). Finally, under pathogeri- 
free or even axenic conditions, m/o plants exhibit a 
spontaneous leaf cell death phenotype, preceded by 

^o^Ta 3 :^ 99 0 3). CharaCteriStiC Wa " aPP ° Siti °^ 
Mutations have also been described in many other 

2s2rnb»n C 't S ^ Ce " death ^P' 0 ^ aPP«ar. 
resembling those in defense'responses to plant patho- 

l^lTh^h 6t 19831 J ° neS - 1994 = ^' P et aL, 
1 -96) It has been suggested that at least some of these 

tei f™£° ne r c °»*^y termed disease lesion mim- 
ics, affect control mechanisms of plant defense Both 
recessively and dominant* -inherited lesion mimic mu- 
tants have been analyzed for indicators of defense re- 
sponses in Arabidopsis thaliana (Dietrich et al 1994- 
Greenberg et al., 1 994; Weymann et al. 1 995) . Apart from 
the onset of cell death in the absence of pathogens 

S P fUnCti ° nS SUCh 35 P' ant cel1 wall modm: 

cations and the accumulation of defense-related qene 
transcnpts and phytoalexins have been observed The 
mutants {Isdl to , s d7 and acd2) were found to exhibit 
elevated resistance to a bacterial (Pseudomonas syrin- 
gae) and a fungal (Peronospora parasitica) pathogen 
Lesion mimic mutants are not restricted to foliar tissue' 
Recessive alleles of the soybean Rn locus exhibit, under 
axenic conditions, HR symptoms in the root, accompa- 
n.ed by the accumulation of defense-related proteins 
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and the pythoalexin glyceollin (Kosslak et al., 1996). 
Homozygous m plants exhibit increased tolerance to 
root-borne infection by the fungal pathogen Phytoph- 
thora sojae. The findings suggest the activation of an 
at least partially overlapping set of biochemical events in 
pathogen-triggered race-specific resistance and during 
pathogen-independent cell death in several disease le- 
sion mimic mutants. 

We describe here the molecular isolation of the ^M/o, 
gen e as a first step toward a molecular ml acpretatiPJ. 
cTl^oadjspectnjm resistance mediated by recessive 
bc^JLsenej^i^aii£m&-The gene encodes a member of 
a novel protein family apparently restricted to plants. 
We discuss the possible dual function of the Mlo protein 
in down-regulating leaf cell d eath a nd pa thogen defense j 
functions. 

Results 

We had previously identified RFLP markers closely 
linked to M/o on barley chromosome 4 in mlo backcross 
(BC) lines containing mlo alleles from six genetic back- 
grounds (Hinze et al., 1 991 ). The identification of a 2.7 cM 
(centiMorgan) RFLP interval (bAL88-bA01 1 ) containing 
Mlo based on the cross Carisberg II Mlo x Grannenlose 
Zweizeilige m/o- 7 7 opened a route to isolate the gene 
via positional cloning (Figure 1; RFLP map). However, 
because the bariey genome has a very unfavorable ratio 
of genetic and physical distances (approximately 3 Mb/ 
cM; Bennett and Smith, 1991; Becker et a!., 1995), we 
applied AFLP (Amplified Fragment Length Polymor- 
phism) marker technology (Vos et al. f 1995) to increase 
the DMA marker density around Mlo and to generate a 
genetic map'with a resolution better than 0.05 cM. We 
aimed to physically delimit the gene with flanking DNA 
markers on single large insert size genomic clones, an 
approach that has been termed "chromosome landing" 
(Tanksiey et al., 1995). 

Targeted Search for AFLP Markers 
We selected AFLP markers around Mlo by searching for 
polymorphic DNA fragments between an m/o BC line 
(BC 7 Ingrid m/o-3) and DNA from the recurrent parent 
(ingrid M/o). The BC? Ingrid mlo-3 line was previously 
shown to carry a small introgressed DNA segment on 
bariey chromosome 4 (Hinze et al., 1991). The donor 
parent of the BC line represents a different genetic back- 
ground (cultivar Matteria Heda mlo-3) in comparison to 
the recurrent parent line. In parallel, we established a 
second segregating F2 population from the cross Ingrid 
Mlo x BC? Ingrid mlo-3, formally representing an eighth 
backcross. To further narrow down the chromosomal 
interval for DNA marker identification to approximately 
3 cM, pooled DNA from resistant (m/o) and susceptible 
(M/o) F2 individuals were included in the search for AFLP 
markers besides DNA of the parental lines (see Experi- 
mental Procedures and Giovannoni et al., 1 991 ). All pos- 
sible Pstl/Msel primer combinations (1,024) extending 
into genomic sequences up to nucleotide positions +2 
and -3 and 880 EcoRl/Msel primer combinations 
( -3/ * 3) were tested. A total of 38 AFLP marker candi- 



High Resolution Mapping 

A three step procedure was chosen to construct the 
high resolution AFLP map. First, we were able to position 
21 of the identified candidate AFLP markers to opposite 
sides of M/o by using recombinants for flanking RFLP 
markers that had been detected among a small number 
of 70 F2 individuals (data not shown). The remaining 17 
AFLP markers could not be separated from Mlo using 
this population size. In a second step, two codominant 
AFLP markers on opposite sides of M/o were chosen to 
screen 2,022 F2 segregants for recombination events in 
the interval. 76 recombinants were identified, and their 
genotype at M/o was determined by testing selfed F3 
families with powdery mildew isolate K1 that is not viru- 
lent on homozygous mlo genotypes. In a third step, an 
AFLP analysts was carried out with each of the remaining 
17 candidate AFLP markers to determine their position 
relative to Mlo based on the 76 recombinants. The cru- 
• cial result was the identification of a DNA marker coseg- 
. regating with Mlo (Bpm16) and two flanking markers 
(Bpm2 and Bpm9) at a distance of 0.24 and 0.4 cM, 
respectively (Figure 1 ; AFLP map). 

Physical Delimitation of Mlo 

A large insert yeast artificial chromosome (YAC) library 
was constructed with genomic DNA of cultivar Ingrid 
M/o using vector pYAC4 (Burke et al., 1 987). The library 
comprises 40,000 clones with an average insert size of 
500 kb and represents approximately four barley ge- 
nome equivalents (construction and characterization of 
this library will be published elsewhere). Four YAC 
clones (YHV417-D1, YHV400-H11, YHV322-G2, and 
YHV303-A6) were isolated by an AFLP screen specific 
for marker Bpml 6, which cosegregated with M/o. AFLP 
analysis indicated that three of these clones (YHV400- 
H11, YHV322-G2, and YHV303-A6) also contained both 
flanking marker. loci (Bpm2 and Bpm9). These findings 
implied physical delimitation of M/o on three YAC clones. 

We chose YHV303-A6 (insert size 650 kb; Figure 1) 
for subcioning experiments into bacterial artificial chro- 
mosome (BAC) vector pECSBAC4 containing a unique 
EcoRi site (Shizuya et al., 1 992; see Experimental Proce- 
dures). Recombinant BAC clones containing the AFLP 
locus Bpm16 were subsequently identified using the * 
cloned 1 08 bp Pstl/Mse! genomic Bpm1 6 fragment from 
cultivar ingrid M/o as a probe in colony hybridization 
experiments. One BAC clone, BAC F15, containing an 
insert of approximately 60 kb was chosen for further 
detailed studies (Figure 1; BAC F15). We found that 
the recombinant BAC clone contained locus Bpm2 in 
addition to the AFLP marker Bpm16, but not locus 
Bpm9, indicating physical delimitation in centromeric 
orientation to M/o. Instead of constructing a BAC contig 
between Bpm16 and Bpm9, we developed new poly- 
morphic markers from BAC F1 5 and mapped them using 
template DNA of 25 recombinants (derived from the high 
resolution mapping population described above) in the 
interval Bpm2-Bpm9. A codominant Xbal/Msel poiy- 
' rnorphism (designated Bxm2) was identified between 
the parental lines Ingrid M/o and BC7 Ingrid mlo-3. The 
analysis of the 25 recombinant individuals revealed a 
noci+inn nf Rxm2 in telomeric orientation from M/o at a 
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RFLP map 

(Charts berg \\ MIo x 
Grannenlose Zweizetiige mlo-11) 



AFLP map 

(Irvgrid MIo X BC Ingrid m/o-5) 
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Figure 1 . Positional Cloning of MIo 

^L^lTV^^r bee " maPPe l Wrth increasin S Precision on the long arm of barley chromosome 4 using RFLP and AFLP markers The 
£ET m rp 1 9Ure £ re !T S m ° S ° netiC " nkaae mSPS ° 1 meS6 ma * ere re,ative to M >°- 0-»tic- distance are indicated in o^oro^s 
( CM ) the RFLP map „ based on mumpoint linkage analysis; and the AFLP map was calculated by two point estimates^ RI^P mTrT^ 

°' fl F2 indiVidUalS deriVed fr ° m 11,8 OTSi Carisbe ^ » Mto * Q-rinenloserweae^ ^ mt ^ Tn\ P rev^ 
bc^I^bv^^, ° T 'T* ^ bS5ed ° n ° n,y 44 F2 ""M*"*- gene was delimited to a 2^ oM^nX* 

id^nt^ » h ^ ° n ? ^ onentat,on > and bAL8a 0" centrbmeric orientation). AFLP markers (Bpm2, Bpm9, B pm1 6. and Bxm2) 
were jdentrfied and mapped as described ,n Experimental Procedures. Their genetic distance to MIo is based on the cross Inqrid MIo x bc! 

Z "b^Hs Zln ? T 6 ' If ™J 03 -™«rT 650 kb> ' C ° ntainin9 *• ~»™«"« Bp m16 and banking ^ (Bprnl 

^ k .f a " SeCt, ° n ° f th6 f ' 9Ure - 71,6 P ° sition of mar * er B P m9 was on, y ™Shly estimated within the YAC clone as 

,nd,cated by the arrow. The .nsart of BAC F1 5 represents a 60 kb subfragment of this YAC as indicated in the lower part of the Rgur e The 
approbate phys,cal pos,t,ons of AFLP markers B P m2. Bpml 6. and Bxm2 (spanning an interval of approximately 30 kb) as we.l^ location 
of some rarely occunng restnetion sites are indicated. Dashed lines below the schematic representation of BAC F15 DNA show £ pont on 
« tS^ ! Sh ^ h DNA S ^ Ue t nCe C ° nti9S - ^ StR1CtUre ° f the M '° ^ ene is *™ schematically in the bottom ,^0, thTfi^ ^ons 
' Tn thef n 9 t XeS ' P ° Slt r S ° f mUtationaI evente are indic *^ «or»e eleven tested m/o alle.es. Mutant alleles carrying demons 

in their nucleotide sequences are marked with a [6). - y a "e^ons 



distance of 0.1 cM (Figure 1; AFLP map). We concluded 
that M/o had been physically delimited on BAC F1 5 be- 
tween marker loci Bpm2 and Bxm2. 

A Candidate M/o Gene 

DNA sequences of the approximately 60 kb insert of 
BAC F1 5 were obtained from randomly chosen clones 
"of a piasmid sublibrary (see Experimental Procedures). 



In parallel, a physical map was generated (Figure 1 ; BAC 
F1 5). The map indicated that the flanking markers Bpm2 
and Bxm2 are separated by approximately 30 kb. Rare 
cutting restriction sites enabled us to assign larger se- 
quence contigs within BAC F15. We searched the avail- 
able sequence contigs for regions of high coding proba- 
bility (see Experimental Procedures). Only one sequence 
contig of 5.8 kb, including the cosegregating marker 
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Bpm1 6, revealed an extensive region of high coding 
probability. • a 

We performed reverse transcriptase-polymerase 
cham reactions (RT-PCR) with total leaf RNA derived 
from cultivar Ingrid Mlo using a series of primers de- 
duced from regions that indicated high coding probabili- 
ties and obtained in each case a distinct amplification 
product (Experimental Procedures). Sequencing of the 
largest RT-PCR products revealed a single expensive 
open reading frame of 1,599 bp (Figure 2). 5' and 3' 

!mtfr^ I 9 T 6 transcri P t were identified using rapid 
amplrficat.on of cDNA ends (RACE) technology. ThJjL 
duced^utative protein of 533 amino acid* h^ * ZZZ ZT 

found to any other described protein in the various data- 
bases but at leas tsixpatative membranerspannino hel i- 
ces ind i cated membranea^ aion (foTdetails, see 
D.scuss.on). We were unable to~d38ct a signal in North- 
em blot experiments containing total RNA with the la- 
beled RT-PCR probe, but a. rare RNA transcript of ap- 
prox.matelv 2.0 kb length was clearly visible in the tested 
Mlo, mlo-1, and mlo-3 genotypes when poly(A) + RNA 
was used (Figure 3). This transcript size is in agreement 
w.th the combined data from RT-PCR and RACE analy- 
sis. A comparison of the genomic DNA and RT-PCR- 
denved sequences revealed 12 exons, each flanked by 
the consensus splice site sequences (Figures'l and 2) 
Since marker BpmlS is part of exon 11 and intron 11 
and, as shown above, cosegregated with the resistance 
phenotype, it represented a candidate Mlo gene We 
started genomic PCR-based sequencing of eleven rnT^ 
tagen-induced mlo r?sistanceallejes and their eorre- 
spond.ng wild-type DNAs (Experimental Procedures) 
These mutants h ad been isolated within six different 
genetic background" We identified' nucleotide .it^T 

T? s P H°i nt mutations S^^SS TO ii^ ffiS 

alleles that at the amino acid level. rSSfflrSffiSnrTaBble 
ammo acid substitutions or truncated versions 61 the 
predicted wild-type protein (Table 1). • Surprisingly ~ a 
companson among the wild-type gene sequences of 
seven tested barley cultivars (Carlsberg II, Diamant, ■ 
Foma, Harsa, Ingrid, Malteria Heda, and Plena) indicated 
not a single amino acid difference. Moreover we ob- 
served that at the nucleotide level the wild-type gene- is 
identical among 6 tested cultivari both in exon and 
intron sequences, whereas cultivar Foma revealed 7 nu- 
cleotide substitutions (2 in exon and 5 in intron se, 
quences) In conclusion, the comparative sequencing-, 
of genomic DNA from various mutant mlo lines and their / 
respective Mlo wild-type cultivars supported our as- 
sumption that we had identified Mlo. 

Characterization of Intragenic Recombinants - 
I^k ? ° Ur intention to P r °vide a chain of evidence 
for the molecular isolation of Mlo that is not dependent 
upon complementation experiments by the time-con- ' 
suming production of transgenic barley plants. We rea- 
soned that recombination events between two physi- 
cally separated mutation sites within the gene should 
give nse to a wild-type allele and an allele carrying both 
mutant sites. The former product of such rare intragenic 



recombination events is predicted to confer susceptibil- 
ity upon powdery mildew attack only if the inactivation 
of the described candidate gene above is a requirement 
for resistance. 

. • Based on this assumption, we performed intermutant' 
crosses with lines containing alleles mlo-1, mlo-5, and 
mio-8, generating in each case at least 10 F1 plants 
(Table 2; note that mutant sites in m/o-7 and mlo-5 as 
well as mlo-1 and mlo-8 are each separated by approxi- 
mately 820 bp, as shown in Figure 1). The mutant alleles 
originate from the genetic backgrounds Haisa (m/o-7) 

3 k? !^ S . ber9 " {m '°- 5 and m '°- 8 >- F2 Populations were 
obtained by self-fertilization. F2 seedlings were screened 
forrare disease-susceptible individuals afterinoculation 
with powdery mildew isolate K1, which is virulent on 
each of the parental Mlo wild-type cultivars (note that 
we were unable to select for products of intragenic re- 
combination events carrying both mutagenic events be- 
cause they are expected to exhibit a resistant pheno- 
type). Susceptible F2 individuals were identified with an 
average frequency of 6 x 10-. This frequency is of 
the same order of magnitude as in previous reports of 
intragenic recombination events in plant genes (Safamini 
^ L ° ren2oni ' 197 °; F "»eling, 1978; Koomneef et al., 
1 983; Dooner and Kermicle, 1 986; Mourad et al 1 994) 
In contrast, when comparable numbers of progeny from 
settings of each of the three mlo mutants were tested 
no susceptible seedlings were identified (Table 2) This 
finding strongly indicated that the susceptible individu- 
als denved from the intermutant crosses were not due 
to spontaneous reversion of the.m/o alleles 

The inheritance of the susceptible F2 individuals was 
tested after selfing in F3 families. Each of the F2 individu- 
als segregated in the F3 in the predicted ratio of 3 sus- 
ceptible to 1 resistant, indicating heterozygosity for al- 
lies conferring resistance and susceptibility in the F2 
Homozygous susceptible F3 progeny were isolated for 
the majonty of susceptible F2 individuals (see Experi- 
mental Procedures). A molecular analysis of these was 
performed using RFLP markers tightly linked «4 cM) 

Side ° f the M '° locus to dete ™e if restoration 
of Mlo function was accompanied by flanking molecular 
marker exchange (Figure 4). A compilation of the de- 

T^k? a " eleS ° f a " re,6Vant aen <*yPes is given in 

i awe 3. The compilation reveals that seven susceptible 
individuals exhibited flanking molecular marker ex- 
change, indicating reciprocal crossover events (CO) 
whereas five susceptible individuals revealed no flank- 
ing marker exchange and therefore a non-crossover 
type of recombination (NCO)'. The latter class could be 
explained by a gene conversion or double crossover 
event. The ratio of the two observed classes (7-5) is 
compatible with the double-strand break repair model 
for recombination (Szostak et al., 1983). The relative 
position of the mutant sites in mlo alleles used in both 
heteroallelic crosses (Figure 1 ) predicts thatthe CO type ' 
recombination events are resolved unidirectionally with 
respect to flanking marker alleles in order to restore 
the Mlo wild-type allele. This is the case for all seven 
analyzed CO type recombinants (Table 3). 

DNA of the CO type recombinants was tested for the 
presence of wild-type or mutant sequences! Genomic 
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helix I 



helix II 



helix III 



helix IV 
helix V 



helix VI 



11 i 

GS CTCCTCCGCCACCAAACCAGACAC*CAGCAGCGTACCTGCG- 
ACGTAGCCTCCCCTTTCTTTTTTTTCCTTTCCCCTCTCTTGCTTCCTCCGCCCCGCCACC 
TCGATAGCCGGCCACGGC^ACGCACCTCGCGGTTGCGTCGCGTGCATCTGCCTGTGCSTA 
.CCTGGTAGAGGCGGCCGTCTGCTTGCrcCGGGCAACCAAGGAGGTTGCGGCGGrCGACCG 

ATG T ^ S ^AAAAGGGG;GCC%GCG^G S jGC^clGGLALcCGXCG T £ ffi3 

^«~==^=^caa^ 

^ T lcACG=GCAG S C^G=ACcL^GCA E = G l iSAlc^SfeOc 
A^TGGGAGACAGAGACCACCTCCTTCGAATACCACTTCGCAAATG^ 

^cag^^gg^gcc^ctt'caggc^^a^tcag^accaa^^Ltac 
cxgaLt^g^^^^ 

^^^^S^SOaac T caa X gc^^,?g 

atca^g^g^^gIga^^ga^ggg^cg^a^cgIggJc^c 
^gagcLagcaacaJgtIc^ 

««^™»™^^ 

acgcJcg^gaIga^^ccJcacg^ga^ggg^gagca^gaJgg^ 

ACACAGATGGGATCAAACATGAAGAGGTCCATCTTCCACGAGCAGACGTCCAAGGCGCTC 
ACCArCT^CG^CACGG^GGAGAfeSS^ACA^Gi^GC^A^ 
•■ ^^^^^^^^^^^^^^^*^^^^^^*^^^^*^®^^^^^G^CGATGCCGAGCCGGGGC 

tca t IacLg t V g^^^=aL G gca^gg%c G %4gga;gJc=?c^ S agc^ 
^=a^c?a>5gaccca%c2ggJgg^ag R ggS=a^ t Icc=gg^gg^gcgc?c 

B c««a^=agtg«gatt^o=t?cagccSgg=atgagacaag 1TT c T gtattca 

tg^agtccca^tgta^gccaacataggatgtcatgattcgtacaaxa^aaatacaat 

tt7ttactgagtc 
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Figures. Nucleotide and Deduced " Amino 
Add Sequence of the Barley MJo cDNA 
The nucleotide and the deduced amino acid 
sequence are based on the combined data 
of RT-PCR and RACE obtained from expert 
meats using RNA of cu/tivar Ingrid Mfo The 
stop codon is marked by an asterisk, the pu- 
tatrve polyadenylation signal is underlined 
and the detected termini of RACE products 
are indicated by arrows above the sequence 
Positions of introns as identified by compari- 
son w.th corresponding genomic clones are 
labeled by triangles below the nucleic acid 
sequence. Six membrane-spanning helices 
pred.cted according to the. MEMSAT algo- 
rithm (see Discussion) are boxed in gray A 
, putative nuciear localization signal {K-K-K-V R) 

ll^ l K T'^ d tW ° CaS6in kinase " s *« 
[S/T-X-X-D/E) are shown in bold type. 



PCR-based sequencing demonstrated in all cases r* k +, 

store<J wild-type sequences. This observation stronot fT*" nUC,eotides ^ and +821 in the cross m/o-7 x 

suggested that the intragenic crossover event occ'ed Tot ^ +3 +™ * ^ — ^ * 

mio-5 (numbers refer to genomic DNA sequences). Due 
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BC Ingrid 
mlo-1 



BC Ingrid 
mlo-3 
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Hgure 3. Northern Blot Analysis of Mm r 

Total RNA (20 ^ and po^ * *ff T^" 86 "* A ~"™<ation 
^Hey primary leaves A ^ «»( Sua of 7 day old uninfected 

--tant (BC Ingrid m/o-,, Bedrid **> and 
separated on a 1. 2 %forrna7d e ni , ' CU " ivars were «°'«ed 
'"lose membrane (Hy^^^^^^toanit™*.^ 
conditions with the radioactive^ te be e ?f Under 
denved from .ngrid M/o (Roure 2J 3,26 RT " PCR P r ° d "<* 

ln the containing polWA)- RN* ^ ^ * detected »"* 

s*e of approximately 2 kb 6 S ' 9nal corTe sponds to a 

to hornomorphism of the DNA in th«„ • 
were unable to further deSftr" T ? ° mterVa,S ' we 
tion sites. In sum the mn Z , ,ntra 9enic recombina- 
aenic recombinants frcTm t 'r ana ' yS ' S ° f Seve " »"tra- 
Wdes final pr^^^^^ 0 Crosses ^ 
9ene represents Mlo ^ descrf bed candidate 

Discussion 

experiments via transonic tT^ COm P"™tation 
evidence rests on plyZl^-f^ ^ Chain ° f 
approximately 30 kbtte ' a h " ° f M '° t0 an 

-rkers and ^Z^^T^ 
Vmg mutation sites in all t~* f f ma PP'ng, identi- 
onstrating that su sc L p «b ^° ™ tantS ' and ^ 
dent with restoration oK the P atho 9en is coinci- 
de gene identificatL rlts ^T'^^ ^ Thus - 
tests involving both qene Tn Jr r t eC,proca ' molecular 



Table 1. mfo Mutant Alleles 



9''ng-mediated gene isoiafi^r* ^ 

inaction by fr.*^:^"* °n 

gene 

equivalent, which is a most doub,e J ? P ^'^ 9Sn0me 
9enome size; Bennett and Smith Ts^Tf ° f ^ hUman 
ments were the construction of a * » h m P orta "t 
genetic map and the annii™,*- ^ Qh res °'"tion 
technology fVos et af 1 sS) enTh, AFLP marker 

target physically to 30 kb TtL Z 9 ° S t0 de,irnrt tn * 
of genome-wide and locaiSofof 3 ^ the ^^ility 
distances. Within the S ^? thf r^" ? * Physical 
ouency was found to be S^£^ b,naflw ^ 
over events among 20 670 FlY^H Z rec, P roca < cross- 
-terva, covering the Z VolToTl^ ^ bP 
comparable number of crossrll? assurn,n S that a 
erated recombinant cTro^m-^* W '" have W- 
sites but were not dS 5 S ' Tym9 b ° th mutan * 
ble intragenic recomSJSs ThlrT" 6 " forsus «P«- 
were identified within the 30 hi ^ recorTlb ''"ants 
markers Bxm2 and Bnrn^ ' nterVal b °rdered by 

similar to t0 0J " cM ™. 

ratios deviate from the genome h" ^ 9Sne - These 
cM/kb) by one to two ordeTs oTmT t !f mate ( °- 0003 
Smith, 1991; Becker et al 1 ggsT^'*^ 6 ' Benne « a "d 
genetic/physical distances f ' L Alt ?° u 9 h »« ratio of 
much higher in tel 0 m»ri. 9 en erally believed to be 
(Hes.op- 9 Ha^n ° 99lT ffiT" ° f P ' ant ^mosomL 
to M/o because of its lo ;^ ar9Ument ^oes not apply 
■™ of barley iS?oiSL7 ^ th^'^ the ^ 
be better explained with theVe'^ V f ' nd ' n9S cou,d 
frequencies of recomb^afc^ S --P«ona„y high 
f «»«er«l distribution of high and low reSU ' tin9 ,n 
frequencies along a chromofT ^combination 
for the Arabidopsis th a T ana ^ haS been s "°wn 
a "d Lorenzoni, 1970 Fre^nq is^" 0 " 16 4 (Sa,ami ™ 
1?83; Doonerand Kenmicle ll'sfl I \ Ko0 ™** * al., 
The deduced aminTaSd \f ' hm,dt et al - 1995 )- 
no homologies to Zy o* e da f T? ° f ^ r6Veals 
gene so far, suppo^g ^La^"^ P ' ant ^stance 
triggering pathogen defense Mr, 3 d ' St,nCt mec " a nism 
no stn'king *^fS^^SSZ P"* ShOWS 
or Prokar.otic g ene in ^^SS^SgSt 
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rnfo-3 
mfo -4 
vn/o-5 
mlo-7 
rnlo-8 
rrrlo-9 
rnlo-10 
mlo- 13 
rri/o- 1 7 
rnfo*2€ 



Haisa 

Malteria Heda 
Foma 

Carlsberg II 
Carls berg II 
Carlsberg II 
Diamant 

Foma 

Plena 

Plena 

Plena 



X-rays 
7-rays 

EMS 

EMS 

EMS 

EMS 

7-rays 

EMS 



EMS 



T*- A 

Deletion of 2 nucfeotides (1188-1189) 
Delet.cn of n nucleotides (47^88) 

G' 77 - A 
A 1 — G 
C a — T 

Deletion of 6 nucleotides (54^548} 
C K — t 
T ew — A 



Effect on Amino Acid Level 



Trp^— Apg 
Frame shift after Phe 393 
Frame shift after Tro 159 
Met 1 — n e . 

Gly 228 - Asp 
Met 1 — Vaf 
Arg» - Trp 

2 amino acids fPh e 1M Thr'«n rr,,v ■ 
VaP°— GIu ' " m >ssing 

Ser 31 - Rhe 
Leu 770 — His 
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Tabie 2. F2 P^env Obtained from mlo Heteran e .ic Crosses ^ Corres pond|ng ^ 

2= ^ s f;« e F F -~ of Susceptible 

mJo-8 x mlo-1 ~ " ■ ■ ^_ r-rogeny 



m/o-8 x mlo-1 . ; 

mfo-1 x m/o-5 '7*1 3 5.7 X 10" 

mlo-5xmlo~7 - , 7, 0 

14,474 g _ ^ 
mto-7 12i6 34 o 6.2X10- 
mi °- 5 5,498 o 
ml °~ 8 8,435 o ~~ 



"Crosses are given female x male. 



GenBank, SWISS-PROT). However, highly significant 
homologous sequences have been identified both in 

?pLlfL da L abaSeS fr ° m n ' Ce and Arabidopsis thaliana 
(tMBL/GenBank accession numbers D241 31 D24287 
N37544, H76041 , T221 45, T2214S, and T88073). In addi- 
tion. we have isolated cross-hybridizing genomic clones 
from barley and rice containing highly homologous DNA 

fho Mf CeS J data not Sh ° wn) - ^ ^"S'V suggests that 
the Mlo prote.n is likely to represent a member of a novel 
prote.n family and implies a conserved function among 
monocot and dicot plants. We failed to detect homolo- 
gous sequences in either the human, mouse, or Caeno- 
rhabditis EST data bases. Homologous sequences were 
also not detected in the Saccharomyces cerevisiae ge- 
nome for which complete DNA sequence information is 
ava.lable (Dujon, 1 996). Thus, Mlo is likely to represent 
a member of a novel protein family restricted to the plant 
kingdom. 

A close inspection of the predicted amino acid se- 
quence reveals six hydrophobic stretches that are likely 
to form at least six transmembrane helices (Figure 2) 
The s>gnifican C e of this finding is supported by applying 



three different algorithms for assessment of membrane- 
anchored proteins, indicating in each case six mem- 
brane-spanning helices (ALOM, Klein etal.,1 985; MEM- 

™,oL aL ' 1994; ™ pred frttP-/A"rec3.unil.ch/ 
software/TMPRED_form.html]).,ln addition, a putative 
nuclear localization motif (NLS) was found in exon 12 
indicating a possible transport of the protein into the 
nucleus (K-K-K-V-R; Nigg et a.., 1991). Two casefn k* 
Z„ J"?! * (S ^- X " X - D/E ; Rihs et al., 1 991 ) are located 
immediately upstream of the NLS. Casein kinase II sites 
are frequently found at distances between 10 and 30 
ammo acids from NLS motifs and have been shown to 
determine the rate of nucleartransport (Rihs et al 19911 
However because NLS motifs appear to be insufficient - 
to target membrane-bound proteins to the nucleus 
(Soullam and Worman, 1 995), detailed functional studies 
are necessary for subcellular localization of the protein 
The a pparent flustering of th» m..» a «™.. ; n MhJBa 
ure 1) may be th» firct w.^ ^ t fiinrflnrT^-uIjuT' 
□omains ,n the protein ffln te.ddM^S^. 
lele, characterized by an 1 1 bp deletion in exon 4, might 
have a special implication. The resulting frameshift is 
predicted to shorten the length of the expressed Mlo 



« Carts berg II mto-3 x 
Hatsa mlo-1 



B 



Hatsa mlo-1 x Carts be ry H m /o_£ 



Figure 4. Southern Blot Analysis of Intra- 
genic Recombinants Derived from m/o Heter- 
oalleiic Crosses 

The alleles of two RFLP markers flanking M/o 
on opposite sides of either susceptible F2 
individuals or homozygous susceptible and 
homozygous resistant progeny were deter- 
mined by Southern blot anafysis. Plant DNA 
{1 0 n.g) of the individuals were digested with 
Pstl (A) or HaelH (B) and hybridized with the 
radioactiveiy labeled RFLP markers WG114 
(upper pane!; maps 3.1 cM in centromeric ori- 
entation to M/o;- see Figure 1) and ABG366 
(lower panel; maps 0.7 cM in telorneric orien- 
tation to Mlo; see Figure 1 ) according to stan- 
dard procedures. CO, crossover type of 
recombinants; NCO non-crossover type of 
recombinants. 

(A) DNA of the parental lines mJo-S and m/o- 1 
" as weli 33 2 homozygous susceptible (S; Mlo 

derived from 2 susceptible F2 plants (designated 1 and 2) were tested Th* n WAc - m . o M ' 0) 3nd * resistant m/o m/o) progenies 

7 susceptible F 2 p, an ts (designated , to 7> ^ere es^ S^ S a^R-o ^ TV ^ ""°> ^ 

by selfing the susceptible F2 individuals 1-7. DMA was anaWzet romT, X •»»»•»••« setected F3 individuate from F3 families obtained 
F2 0 en, ra ,i„„ o-, W " from 2 »"*«r suscept.ble .individuals of this heteroal.elic cross oniv in the 
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Parental Genotype 
in Telomeric 
Orientation to M/o c 



Testcrosses* 



Susceptible 
Plants 



mlo-S x m/o-7 



m/o-7 x m/o-5 



Parental Genotype 
in Centromeric 
Orientation to M/o b 



m/o-7 
mlo-1 
mio-8 
m/o-7 
m/o-7 
m/o-5 
mlo-1 
m/o-5 
m/o-5 
mlo-1 
rnio-5 
mlo-1 



m/o-8 
mio-8 
mlo-B 
m/o-5 
m/o-5 
m/o-5 
m/o-5 
m/o-5 
m/o-5 
m/o-5 
m/o-5 
m/o-5 



Type of 
Recombination 



CO 
CO 
NCO 
CO 
CO 
NCO 
CO 
NCO 
NCO 
CO 
NCO 
CO 



CO - crossover type, NCO - noncr 
* Crosses are given female X male. 
Deduced from alleles of RFLP marker wn i * / 

F2 individuals; in „ other cases homo2ygous 



at thiS resist — 
the proS " 3 C ° mPlete ^ional -activation of 

to Jh? StUd H ShOW " that bTOad ^ectrum resistance 
to the powdery mildew fungus is caused by a defective 

on^M ^ ° f k6y QUeStions ~ functus! 
ment^ H r ° te ' n ftS homol °9"es. Based on expeS 
mental ev.dence, we propose two hypotheses In the 
first mode, M.o would have . negativfcontrotfunot on 
m leaf cell death since punctate dead cell leaf teSs 
appear even in axenica.ly grown seedlings caS, no dif 
ferent mlo alleles (Wolter et al 1 9Q-*\ ,„ ll carT >""9 d,t " 

in animals (Raff, 1992- White l^fSii ? 9V '° pCd 

a w-f, i+ ., , niie ' iy96 )» M| owou d suppress 

seenano. res.stance would have to be envisaged as 
a consequence of deregulated pod The int^ta |JE 

of 95 ^ been „ Studied ° n *e basis of an allelic stnt 

nentnch, 19 88 ). The defective alleles could be classified 
according to gradual.y different infection phenols 
upon infection of a mixture of nine powdery rSSeW 
■so lates. Only three mutant alleles were foundToTxhS 
an ln termed,ate infection phenotype (i e a consTd^l.l 

ace) and revealed no macroscopically detectable leaf 
ixhibT^" C ° ntraSt ' m ° St efficient resistance 2liS ' 

ffa ure for w m/ °.f OUb,e mUtantS as **«mJE fby 
a failure for trypan blue uptake in the latter genotype 
(P. S.-L et al, unpublished data) allotype 

SQ !l-r Ur SSCOnd m0dSl ' the M1 ° P roteir > would have a 
,nn ,. n 1 9at ' Ve re 9 ulat ory function by down-reou.at 
■ng multiple defense-related functions" Spontaneous 



the fatal end o7 9en0typ6S W ° U,d merel * ^present 
esp^sas tt t ^ aCCUmulatin ? Nation of defense 
of H-ff , supported by the chronological order 

of defense-related events in mlo genotypes in the S 
sence of pathogens. Cel. wal. ap^S^S 2." 

day old seed.ings, trypan blue positive leaf c e tf pa tches 
appear .nd.cat.ng commitment to cell death and 2-1 

2X2 Ts 7TT beC ° me - a -scopic a aTy oe! 
lectable (P. S.-L. et al., unpublished data) However 

*£%?JZT" TJ StanCe " fU " y ^nctionaMn 5 day' 
risisTln^ t ?f ' 6ar1ieSt time P° int to cany out a 
res.stance test for technical reasons Thus estLi^h 

ment of defense-associated events ir ^athoge 2£ 
mto genotypes is not a requirement for effective re S Ts! 
Id TT attempted P°- d ery mildew attack We con- 

the se"™" ° r "P-regulates the responsiveness of 

the seedhng for the onset of pathogen defense. 

■ Th s .s an important difference to all but one character 

bfn ° n mim, ' C " Arabid °Psis as we., as to the Jet 
bean m mutants, in which elevated resistance to n=th 
gens is either dependent on lesion ££X£%£S1Z£ 

SLntVy tith ^ 1 ^ 1 " 6) ° r " ex P re ssed concom- 
anS /L7 w appearance ° f d ^ad cell lesions {Isd6 
and /soy, Weymann et al. 1 995 ). So far, only the Arabi- 

vaTd'^tHo' 65 ' 0 " mimiC aPP — ^ -Mb. S. 

trich efal i 9 9^ e T StanCe 31 the pre,esion st ate (Die- 
tnch et al., 1994) In contrast to the determinate and 
punctate growth of lesions in m/d leaves, 
ticn , S .ndeterminate in Isdl, consuming he entire teaf 

deTe^m'T , C,n C3nn0t be ^SrJjZS^ 

-"' d - fungus in" . 



the host cefl survives the attack (Jargensen and Mor- 
ensen, 1 977; Wolter et al., 1 9S3). A priming of defense 
a.nct.ons ln mlo plants wou)d maRe ft £ «en*B 

cent defense responses in the Mlo genotype (e.g CWA 
format.on) become efficient (e.g., through incased 
speed and/or reduced response times) but tha^e eariv 
developmental arrest of the pathogen is insurgent lo 

Son ^ ° f the h ° st ce « ^ 

Both of these two seemingly different functions of the 
Mlo . proton could be expiained by assuming that the 
EJSlS^ dl ^ ft " Ct, ° n " down -°-'at4 onse of 
We £" c StS h"^ ° f mU,tip,e d6fense functio -- 
the anaSrof th b,OChem,cal characterization of Mlo, 
and** Snt? * e Un ' qUe co,lectior > of m/o mutants 
and the .dent.ficat.on of proteins with which it physical! J 

Experimental Procedures 
Plant Material 

bie F2 plants that showed heterozygosity for RFLP rZrC Sus cept t - 
Powdery Mildew Infection Tests 

ven » et al 1996). The genotype at Mlo of recombinants used for th. 
AFLP Analysis 

et .1 MSB! i' ^ * r ^ W CarTied 0ut as des <=rib«J by Vos . 
su^L^k, f ° Ur ° NA *""*'*«■■ has been used: from the 

susceptible parent cultivar Ingrid Mlo, the resistant parent BcTLrid 

S t 3 P - ! 2 reSiStam " individ "»* <mte-3 nLs> an?a pool 
sent^A* P ' T b 3 • AmP ' med 9enomic fr»»ment. repre- 

tronrT P J Cati ° n P rodu ^ were isolated after agarose gel 



The YAC library of barley cultivar Ingrid was es1ahl,«h~< 
PYAC4 vector (Burke et a... , 987) and yeZ ZTn^ tT'S 3 , ^ 
o the library construction and its charactenza™ be lesc^ 
elsewhere. Screening for YAC clones containing marker Bom^T 

YHV303-A6, total DNA of this yeast clone was used After 

DNA fragments .n the s,ze range of 50 Kb were recovered and sub ' 
cloned , n the pECSBAC* vector. Clones carrying YHV303-A6 

Yu V o m », „ w ^Dsequently, labeled recombinant chromosome 
Phorest ennChme « * Preparative pulsed-field ge, elect™ 

DNA Sequencing of BAC F15 

DNA of BAC F1 5 was isolated by an atkaline lysis .arge scale olasmid 
preparation according to Sambrook et al rtgasi <fn „ ♦ ? 

tzsr rt ed by hi9h pressur *«^^^ ™ p s^ 
t^r^^ 

n»-in reaction. ONA fragment^ l^S!7Z^T^ 
were .solated from agarose gels using a DNA fsoia ion kitWetsort 
Genomed Inc., USA), subcloned into the pBluescrip, SK vector Strt ' 
S' and r Pagated " E - ° 0,i D "5° Clones caring BAC p "" 

it o s N r of w BA C jsr. b Pr0 b y e b i di2ation us ^ *• ^ 

^ , . ° no as a probe. Sequencing reactions w P r P 

consT 6 , " deSCnbed ab ° Ve - Eval ^«°" of the sequencing data 

PCR-Based Sequencing of Alleles at Mlo 

Plant chromosomal DNA for this purpose was isolated accordinn 
to Chunwongse et al. (1 993 ). DNA sequences of Ca.e^s oMh! 
drfferent barley varieties, m /o mutants BC Hnef and intra "n 
combinants used in this study were obf^b^^^^ 

PCH%s J,l 9e ::£ aCh 40< ^°° bP len ^> -ere amplifierTby 
PCR (35 cycles, 60"C anneal.ng temperature). After preparative aqa- 
rose ge, electrophoresis and isolation of the m ^ TprcTducts 

fied ^ura orod e T med ' nC - USA »' -re^ampt 
ad« i„H , 3 P roducts were subsequently purified from nucieo- 
Mes and ohgonucleotides (Jetpure. Genomed Inc., USA) and us^l 
as a template ,n DNA sequencing reactions (see above) All DNA 
sequences of mutant alleles and corresponding reoions of tL . 
renta. ,ines and the intragenic recombinants werl deTed fnom bo* 
strands and confirmed in independent sets o, expSnt" 

S?S Ra f ^ ^P™ 0 " 8 "" °* cDNA Ends (RACE) 
n i-ki,m was performed using the SUPERSCRIPT r,r=,~ x. 

systemforfirst-strandcDNAsynthesistGl^ 

^^^^^"^^^^"'-^ 

i-ira-strand cDNA synthesis was primed by an olioofdT) nri m =r tk 

putative coding region of the Mlo gene was "Sl S 

Z£ k ^ anneahn 9 temperature). The resulting product was ana- 
lyzed by d,rect sequencing. 5' and 3' ends of the M/o cDNA were 
kttTcZechfT^hf""" 9 MARATH °N cDNA amp^cZn 

CATGCTGATG3CTCAGA) uJSSSTggSZJ^ 
cloned ,nto pBluescnpt SK (Stratagene). Ten 5 end lnd e ght T 
end clones were chosen for DNA sequence analysis. 
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, Introduction 

Nuclear protein localization is a relatively new topic 
1 the general field of protein transport. Despite the late 



bbreviations: BSA, bovine serum albumin; HSA, human serum 
bumin; HSP, heat shock protein; II F, indirect immunofluorescence; 
EM, A^-ethylmaleimide; NLS, nuclear localization signal; NPC, 
jclear pore complex; WGA, wheat germ agglutinin. 
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start, many significant advances have already been 
made, and the study of nuclear protein transport is now 
precociously making the transition from an immature 
stage of observation and definition to a more mature 
stage in which meaningful statements about molecular 
mechanisms can be made. This rapid maturation can be 
attributed to a wide-spread interest in nuclear protein 
localization and the fortunate circumstance that the site 
of nuclear import, the nuclear pore complex, is unam- 
biguous. Here we review the experimental approaches, 
findings and conclusions which have shaped the current 
understanding of nuclear protein localization. The 
reader is referred to other reviews for additional infor- 
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mation and perspectives on nuc*~<ir import [11,31,55, 
125,137], the nuclear envelope [45,49,67,98] or the 
nucleus in general [68,109]. 

Nuclear proteins are synthesized in the cytoplasm 
and posttranslationally transported across the nuclear 
envelope which separates the cytoplasm from the 
nucleoplasm [31,41]. The nuclear envelope consists of 
two concentric membranes (the inner and outer mem- 
branes), nuclear pore complexes and the nuclear lamina 
[49]. The inner and outer membranes, which appear by 
electron microscopy as conventional lipid bilayers, are 
separated by a so-called perinuclear space which is 
continuous with the lumen of the endoplasmic reticu- 
lum (ER). The outer membrane is continuous with the 
ER and is functionally similar. The nuclear pore com- 
plex is a large proteinaceous, grommet-like structure 
that traverses the nuclear envelope at sites where the 
inner and outer membranes are fused, thereby creating 
an aqueous channel through which proteins (and RNA) 
cross the nuclear envelope. Transport into the nucleus is 
thus most likely fundamentally different from the trans- 
port of proteins into other organelles, or to the cell 
exterior, in which transported proteins presumably pass 
directly through the hydrophobic environment of a 
membrane [148]; a large, transport-mediating pore 
structure such as the nuclear pore complex is unique to 
the nuclear envelope. The nuclear lamina is a fila- 
mentous protein meshwork which lines the nucleo- 
plasm] c surface of the inner membrane. Both the inner 
membrane and the nuclear pore complex are anchored 
to the lamina. 

Transport across the nuclear envelope is an active 
process mediated by a nuclear localization signal (NLS) 
contained within the transported protein. Nuclear lo- 
calization signals interact with nuclear pore complex or, 
possibly, cytoplasmic components. 

II. Experimental Approaches 

JI-A. In vivo 

The early studies of nuclear import relied heavily on 
in vivo approaches. In vivo approaches have generally 
consisted of delivering a wild type or mutant nuclear 
protein to the cytoplasm of an appropriate host cell and 
subsequently determining its cellular location. Delivery 
methods include transfection, transformation, and mi- 
croinjection. Methods for detecting a protein's cellular 
location include indirect immunofluorescence, histo- 
chemical stains, subcellular fractionation, electron mi- 
croscopy, and functional assays. 

A nuclear import substrate is frequently introduced 
into the desired cells by transfection. In higher 
eukaryotes, transient transfection assays are most com- 
monly employed [51,59,75,117,151,161]. In studies per- 



formed with y^-ot, plasmid DNA encoding a nude 
protein of interest is introduced by transformation ai 
stably transformed cells, are isolated by the ability 
plasmid borne markers to complement auxotrophic m 
tations [9,64,91,99,100,104,136,145]. Less frequent! 
nuclear import substrates have been delivered by c 
pression during viral infection [79,83,155]. 

Microinjection is an alternative route for introducij 
nuclear import substrates. In several cases, cloned DN 
has been microinjected into the nucleus to serve 
template to produce the desired substrate [26,76]. Tfc 
method allows one to readily assay a variety of mutan 
without purifying the substrates. More often, protei: 
are microinjected. Proteins used in microinjection e 
periments have been isolated from the nucleus or cyt 
plasm of the original organism [10,29,32,33,60] or fro 
E. coli [16,93,124], have been translated in vitro [23] . 
have been synthesized by chemically cross-Unkii 
nuclear localization signal peptides to a carrier prote 
(usually serum albumin) [18,56,84,85,86,152,159], The 
microinjection experiments have shown that a nucle 
protein does not lose import competence, e.g., by pr< 
teplytic removal of a signal, nor does it need to I 
synthetically modified in the eukaryotic cell to be in 
ported. A variety of cell types, including monkey eel 
[75,83,93,121], rodent cells [124, 152], human eel 
[18,159], Kangaroo cells [15] and Xenbpus oocyte 
[16,23,26,33], have been amenable to microinjection f< 
nuclear import studies. Advantages of microinjectio 
are that it does not depend on a developed transform; 
tion system nor does it require a cloned gene. 

Another common substrate which has yielded in 
portant information in microinjection experiments hi 
been protein-coated gold particles [3,34,35,40,105,121 
Studies with protein-coated gold particles have show 
very incisively that proteins enter the nucleus throug 
the nuclear pore complex [40]. 

Following delivery of a nuclear import substrati 
various detection methods allow one to ascertain th 
cellular location of the protein being studied. Detectio 
methods, however, are generally constrained by havin 
to look directly at the nuclear and cytoplasmic compart 
ments of the cell; one does not have the luxury of bein; 
able to assay nuclear localization by a molecular weigh 
shift in a gel since there is no localization-associate* 
processing event (proteolytic processing or glycosyla 
tion) in nuclear import. A flexible and widely-uset 
detection technique is indirect immunofluorescence 
(IIF). For peptides too short to be detected by antibod 
ies, or for proteins with no available anti-sera, gent 
fusions have been commonly constructed with a numbei 
of different antigenic tags. These fusions serve severa 
purposes in addition to providing an antigenic tag; thej 
can add to the import substrate a large cytoplasmic 
protein that blocks nuclear entry by simple diffusion, 
they can increase the stability of short peptides and 
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trotein fragments and they can provide easily as- 
ayable enzymatic activity. Gene fusions have most 
requently been constructed with the E. coli enzyme 
l-galactosidase [16,64,76,91,99,100,118,124,136,145]. 
: usions to pyruvate kinase [25a,76,94a,122], a-globin 
>6], galactokinase [94], dihydrofolate reductase and 
hloramphenicpl acetyltransferase [94a] have also been 
escribed. These fusion proteins are usually detected by 
tF using antibodies directed against the antigenic tag 
rotein, although, in one case, £-gaIactosidase fusion 
roteins were also localized by a histochemical assay 
24]. Methods to quantify the immunofluorescent sig- 
al in different compartments of the cell have also been 
scently developed [63,106]; this is particularly useful 
>r comparing localization of the same antigen in differ- 
lt. mutant backgrounds. Fluorescence microphotolysis 
as been used to monitor the actual movement of 
roteins into the nucleus [116]. 
Often, radiolabeled proteins have been microinjected 
to Xenopus oocytes and their cellular location has 
sen traced by autoradiography of oocyte sections; 
tematively, they have been detected by manual sep- 
ation of nuclei and cytoplasm and subsequent quanti- 
sation of the labeled protein in each fraction by scin- 
lation counting or gel electrophoresis and autoradio- 
aphy [10,16,26,29,33,56,60], Manual enucleation to 
itefmine the cellular distribution of a protein is com- 
on practice with Xenopus oocytes due to the ease with 
iich this can be done with these large cells. More 
inventional subcellular fractionation techniques have 
*n used with yeast cells, but less reliably [64,103]. In 
ast cells where injection of a labeled protein is not 
tssible, the protein being studied is usually detected by 
; enzymatic activity. In experiments in which protein- 
ated gold particles are injected, the gold particle is 
■tected by electron microscopy [3,34,35,40,105,121]. 
Cellular location can also be indirectly inferred from 
national assays. These include viral replication, virus 
aque formation [75,83,132], regulation of gene expres- 
>n [62,64,135] and ribosomal assembly [145]. 

-2?. In vitro 

The recent advent of in vitro systems for nuclear 
otein localization has made it possible to ask previ- 
isly technically impossible questions. One of the most 
oductive in vitro approaches has capitalized on the 
uque and unusual ability of Xenopus egg extracts to 
institute nuclei [107,108]. Isolated Xenopus or mam- 
alian nuclei from rat liver can be reconstituted, or 
ealed\ by the addition of Xenopus egg extract. In 
Edition, egg extracts can assemble artificial nuclei 
ound Xenopus sperm or bacteriophage X DNA. Both 
e artificially assembled nuclei and the reconstituted 
itural nuclei exclude large non-nuclear proteins (im- 
unoglobulin and phycocrythrin) but efficiently import 



fluorescently tagged nucleoplasmin or conjugates be- 
tween BSA and peptides bearing the SV40 T antigen 
NLS [105,107]. This system has been used to show that 
(1) nuclear import is ATP-dependent [108], (2) binding 
of WGA blocks nuclear import [43], (3) nuclear import 
can be experimentally separated into two steps, binding 
followed by an ATP-dependent translocation step [105] 
and (4) nuclei reconstituted in the absence of WGA-bi- 
nding proteins contain nuclear pores that permit the 
passage of dextrans (free diffusion) but no longer im- 
port proteins [42]. 

In vitro nuclear import systems have also employed 
isolated nuclei, purified from either rat liver [73,96], 
mouse liver [114], or S. cerevisiae [77,48], The rat and 
mouse liver-derived in vitro nuclear import systems 
employ substrates (including nucleoplasmin, SV40 large 
T antigen, adenovirus Elb, HSV thymidine kinase, and 
mouse p53) biosynthetically labeled with [ 35 S]methion- 
ine or fluorescently tagged with fluorescein or .phyco- 
erythrin. The import assays consist of incubating iso- 
lated nuclei with import substrates, pelleting (in some 
cases through a sucrose cushion) the nuclei from this 
incubation mixture, and then detecting the nucleus-as- 
sociated proteins by a fluorescent signal or by SDS- 
PAGE electrophoresis and autoradiography. The 
nucleus-associated proteins are judged to be inside the 
nucleus based on immunofluorescence [73,96], inextrac- 
tability by 1% Triton X-100 [96,114] or inaccessibility to 
immobilized trypsin [96,114]. Furthermore, cytoplasmic 
variants of the SV40 large T antigen fail to be imported, 
demonstrating specificity for a NLS [73,96]. These in 
vitro systems have demonstrated that nuclear associa- 
tion is time-, temperature- and ATP-dependent, arguing 
that import is an active transport process. Interestingly, 
in contrast to studies using assembled or reconstituted 
nuclei [43,106], in some of these in vitro studies nuclear 
import was not enhanced by a cytoplasmic extract 
[96,114] or inhibited by WGA [114]. 

The 5. cerevisiae in vitro import systems use sub- 
strates generated by in vitro transcription-translation of 
the non-yeast substrates SV40 large T antigen and 
nucleoplasmin [77], or the yeast proteins MCM1 and 
STE12 [48]. In both cases, non-nuclear proteins are 
excluded from the nucleus whereas nuclear proteins are 
imported into the nuclei as shown by co-sedimentation 
through a sucrose cushion and protection from exter- 
nally-added protease. Import is time-, temperature- and 
energy (requires ATP hydrolysis) dependent. These yeast 
systems should further the characterization of nuclear 
import of the large number of yeast proteins whose 
import has been studied in vivo. Moreover, they should 
enable further characterization of nuclei from mutants 
that suffer an in vivo import defect (npll /sec63) [131] 
or have an altered nuclear pore structure. 

Lastly, one other in vitro approach has been the 
isolation and study of closed nuclear envelope ghosts 
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TABLE I 

Nuclear localization signals (NLS) 

•Position of rignal* refers to the region of the protein which was shown to actually contain a NLS. NLS sequences which are derived from a large 
region should not be taken too literally since they are usually chosen based on homology with another NLS and therefore do not necessaril) 
represent an actual NLS. An indicated sequence may also, depending on how it was identified, represent more than or only part of a signal. 
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Yeast 
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Influenza virus 
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Yeast 


Ribosomal 


391 
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Adenovirus 
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Hepatitis B virus 
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Yeast 
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protein L29 
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Deduced signal sequence 



Reference 



K-l-P-I-K 7 

V-R-I-L-E-S-W-F-A-K-N-I 152 

P-K-K-K-R-K-V 132 

7 

A-A-F-E-D-L-R-V-R-S 343 
P-R-K-R 21 

V-S-R-K-R-P-R-P-A 197 
P-K-K-A-R-E-D 28 * 
A-P-T-K-R-K 6 
K-R-P-R-P 289 
P-N-K-K-K-R-K 323 

R-P-A-A-T-K-K-A-G-Q-A-K-K-1C-K-L-D 172 

K-K-K-I-K 517 

7 

R-V-T-I-R-T-V-R-V-R-R-P.p-K-G-K-H-R-K 

G-K-K-R-S-K-A 33 

K-A-K-R-Q-R 303 

7 

D-R-L-R-R 38 
P-K-Q.K-R-K 221 
V-R-K-K-R-K-T 537 
A-K-K-S-K-Q-E 354 
P-A-A-K-R-V-K-L-D 321 
R-Q-R-R-N-E-L-K-R-S-F 374 
T-K-K-R-K-L-E 422 
P-K-T-R-R-R-P » 
S-Q-R-K-R-P-P 17 
R-L-P-V-R-R-R-R-R-R-V-P 373 
G-R-K-K-R 52 

V-R-T-T-K-G-K-R-K-R-I-D-V 421 
R-K-F-K-K 642 

7 
? 

R-R-N-R.R.R-R-W 45 

? 

I-K-Y-F-K-K-F-P-K 3,4 

P-R-E-S-G-K-K-R-K-R-K-R L-K-P-T 2n 

K-K-K-K-K 624 

P-P-K-K-R 46 

P-K-K-K-K-K w 

? 

S-K-R-V-A-K-R-K-L 133 
P-L-L-K-K-I-K-Q 521 
P-P-Q-K-K-I-K-S 344 
P-Q-P-K-K-K-P 322 

F.K.R-K.H-K-K.W-S^N-K-R-A.V-R.R 267 
S-K-C-L-G-W-L-W-G 29 
G-K-R-K-N-K-P-K 313 
? 

K T-R K-H-R-O 12 
K-H-R-K-H-P-G 29 
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87 
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53 

58 

78 

25. 144 
93 
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161 
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18 

59 

111 

20, 95, 115 
120 



147 
101 

91 

25a 

25a 

25a 

25a 

25a 

112 

14 

145 



[123]. These vesicles rapidly import ^stoies and 
pucleoplasmin (as well as the non-physiological sub- 
strate poiy-lysine) while effectively excludisg non- 
nuclear proteins. Somewhat curiously, this import was 
stimulated by GTP or GDP but not by aiy other 
nucleotide. These vesicles also support ATP-stimulated 
export of polyA containing mRNA. 

II'C. Genetics 

The full potential of a genetic approach ii the study 
of nuclear import has not yet been realized' A genetic 
approach should provide detailed in vivo information 
on nuclear import and should facilitate over-production 
and purification of components of the import machin- 
ery for reconstitution studies. Genetic studies take two 
routes, isolation of the genes encoding components that 
slay (NLS binding proteins) or may play (constituents 
>f the NPC) a role in import, and genetic screens and 
(elections to identify mutants defective in nuclear im- 
port, j 

Several features make S. cerevisiae the most attrac- 
ive candidate for genetic analysis, not the least of 
vhich are the well developed genetics of this (organism. 
Additionally, nuclear targeting signals have been identi- 
fied in several yeast proteins (see Table I). Moreover, 
Llthough few yeast proteins contain the canonical SV40 
r antigen NLS, this signal can target proteins to the 
lucleus in yeast [9,104], arguing that some components 
re conserved ^tnd that findings in yeast should be 
;enerally applicable to other eukaryotes. Nuclear locali- 
ation signal binding proteins have been detected in 
east [88,138], and reagents with which to clone their 
espective genes should soon be available. Although 
airification of yeast nuclei is somewhat problematic 
iue to their small size and fragility, functional yeast 
luclei can be obtained for in vitro studies [48,77]. 
antibodies have been raised against isolated yeast nuclei 
5,72], as well as against nuclear pore complex prepara- 
ions [4], Additionally, some antibodies against mam- 
malian pore complex proteins cross-react with yeast 
uclear envelope proteins (Ref. 6, L. Davis & G. Fink, 
ers. comm.). These antibody reagents provide a means 
o isolate genes encoding nuclear pore and envelope 
roteins, some of which could play a role in nuclear 
nport; this should, in turn, allow the easy isolation of 
iterating mutants. The recently developed in vitro 
ystems for import of nuclear proteins into purified 
east nuclei [48,77] should allow detailed mechanistic 
tudies of such mutants. 

The strength of yeast genetics rests not only on the 
bility to isolate proteins and then obtain and manipu- 
Ue the desired gene, but also to design genetic screens 
nd selections that can leapfrog many years of effort in 
•olating rare and difficult to manage membrane pro- 
sins of interest. Although unsuccessful, one early 



genetic \cr6cn to isolate mutants defective in nuclear 
import «f ribosomal proteins proves illuminating (J. 
Teem & 3. Fink, pers. commun.). S. cerevisiae is sensi- 
tive to tte translation inhibitor cycloheximide; resistant 
mutants (cyh2) express a variant ribosomal protein that 
no longer binds cycloheximide. Importantly, resistance 
is recessive. Teem and Fink expressed the CYH2 gene, 
encoding \he cycloheximide sensitive version of the 
ribosomal protein, from a galactose-inducible promoter 
and Teasomd that a cell conditionally defective (ts) in 
nuclear import would be unable to assemble cyclo- 
heximide sensitive ribosomes (ribosomes are assembled 
in the nucleus) when grown in galactose at non-permis- 
sive temperature. Mutants which failed to become sensi- 
tive to cyclokeximide were obtained by this screen but 
all turned oui to be defective in secretion (sec mutants); 
it remains enigmatic how these mutants satisfied the 
selection. 

An alternative approach has been to fuse a NLS to 
an otherwise non-nuclear protein and mistarget the 
protein to the nucleus [131]. In this case, the SV40 large 
T antigen NLS or the N-terminus of nuclear GAL4 was 
fused to cytochrome c,, a protein normally found in 
mitochondria. By a functional assay and indirect im- . 
munofluorescence, Sadler et al. [131] concluded that the 
hybrid proteins are now mislocalized to the nucleus. By 
immunofluorescence, the hybrid proteins are located at 
the rim or periphery of the nucleus and, by the func- 
tional assay, do not complement a cytochrome c 3 muta- 
tion and do not allow growth on glycerol. Conditional 
(ts) mutants that express the hybrid protein and grow 
on glycerol at permissive temperature were isolated. The 
first to be extensively characterized, npll, is an allele of 
sec63, previously identified as having a role in early 
import into the endoplasmic reticulum (ER) [30,128]. 

There are three plausible explanations for the ap- 
parent role of SEC63 in nuclear import. First, the 
protein may play a role in both early ER and nuclear 
import. SEC63 has homology to the E. coli heat shock 
protein DnaJ, and thus its role could be to act as a 
chaperone to promote import. Second, because the ER 
is continuous with the nuclear envelope, nuclear pore 
components could reach their ultimate destination 
through the ER. A mutation affecting ER entry might 
thus disrupt pore assembly and consequently block 
nuclear import by an indirect effect. Third, from the 
immunofluorescence data presented, the NLS-cy to- 
chrome c, hybrid protein in a wild-type strain is located 
at the nuclear periphery. Since the lumen of the ER is 
continuous with the nuclear envelope, the fusion protein 
could actually be in the ER and not in the nucleus 
proper. Thus, the sec63 mutation may simply block 
entry of the hybrid protein to the ER, as one would v_ 
expect based on the known phenotype of sec63. How- 
ever, this last explanation cannot easily be reconciled 
with the finding that a mutation in a putative NLS of 
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' GAL4 in the GAL4-cytochrome c } .ivbrid allows this 
hybrid protein to reach the mitochondria. 

Lastly, we note one final approach that has not yet 
been applied to nuclear import, genetic suppressor anal- 
ysis. Here, one begins with a partially crippled protein 
and selects compensatory mutations that mitigate the 
original defect. Often these mutations identify inter- 
acting components. One could envision screening for 
mutations that suppress a protein bearing a defective 
NLS. Among such suppressor mutants, one might ex- 
pect to find an altered NLS-binding protein. Similar 
approaches have been successfully applied to the study 
of protein secretion in E. coli [8]. However, in the case 
of nuclear import, this approach is complicated by the 
fact that, since a NLS is ah integral part of a mature 
protein, a mutation in a NLS could easily alter more 
than just a targeting function. Suppressors of most NLS 
mutations might thus have to suppress defects in two 
unrelated functions and would, most likely, not be of 
interest. 

m. Nuclear localization signals (NLS) 
III-A. The signal 

Proteins are targeted to the nucleus by specific sig- 
nals (NLS) within the proteins* primary sequence [29]. 
These signals have been delineated for an expanding 
body of proteins (Table I), by two general methods. 
NLSs have been identified as sequences (1) that can by 
genetic or biochemical fusion render a cytoplasmic pro- 
tein nuclear or (2) which when deleted or mutated, no 
longer promote nuclear uptake of the protein in which 
they reside. As has been previously shown for signals 
that direct proteins to other cellular compartments, 
there is no single, strict consensus NLS, but there are 
some general rules. Nuclear localization signals are: (1) 
typically short sequences, usually not more that 8-10 
amino acids; (2) contain a high proportion of positively 
charged amino acids (lysine and arginine); (3) are not 
located at specific sites within the protein; (4) are not 
removed following localization; and (5) can occur more 
than once in a given protein. 

Pioneering studies with the abundant Xenopus 
nuclear protein nucleoplasmin clearly and elegantly 
demonstrated that the C-terminal domain of this pro- 
tein is responsible for mediating nuclear import [33]. 
This work showed that sequences within a mature pro- 
tein direct nuclear localization and set the stage for 
much to follow. Studies on the yeast MATa2 protein, a 
transcriptional repressor, were the first to show that a 
nuclear protein can target a fused non-nuclear protein 
to the nucleus and the first to provide the actual se- 
quence of a NLS [64], Genetic fusions coupled with IIF 
and subcellular fractionation revealed that MATa2 can 
target the prokaryotic enzyme £-galactosidase to the 



nucleus and that omy the N-terminal 13 amino acids of 
MATa2 are sufficient for this targeting. Found within 
this short stretch is the pentapeptide KIPIK which has 
been shown to be important in nuclear import [63], A 
similar sequence of two positively charged amino acids 
flanking three hydrophobic residues, one of which is a 
proline, is present in several other yeast nuclear proteins 
and is not present in any cytoplasmic proteins currently 
known [61]. However, a peptide containing this se- 
quence cannot target a crosslinked carrier protein to the 
nucleus in mammalian cells [18,84] nor is a homolog of 
this sequence in yeast histone H2B involved in nuclear 
import [99]. In contrast, the adenovirus pTP protein 
contains a MATa2 NLS homolog, RLPVR, within a 
sequence implicated in nuclear localization (161) and 
nucleoplasmin contains a similar signal, RPAATK, 
shown to be required for import [32]. 

Concurrent with the studies of MATa2, the nuclear 
localization signal was determined for the SV40 large T 
antigen. By deletions, point mutations, and gene fu- 
sions, the sequence PKKKRKV (residues 126-132) was 
defined as the SV40 T antigen minimal NLS [21,75,76, 
83]. Subsequently, it was shown that peptides bearing 
this minimal sequence are sufficient to target a cross- 
linked carrier protein to the nucleus following micro- 
injection [18,56,84,86]. Homologs of the SV40 NLS are 
found in many other nuclear proteins (Table I). Com- 
parison of eight such homologs has generated the four 
residue consensus Lys-Arg/ Lys-X-Arg/Lys [18]; the 
positive charge at the first position of this sequence (the 
second lysine in the SV40 T antigen minimal NLS, 
PKKKRKV) has been shown to be necessary for NLS 
function [21,84]. In general, the most conserved features 
seem to be several positively charged amino acids 
(arginine or lysine) associated with a Droline. These 
conserved features are similar to those of the MATa2 
signal, however, the T antigen signal and the MATa2 
signal arc historically considered to be two canonically 
different signals. 

Although the classic studies on SV40 T antigen clearly 
demonstrate that PKKKRKV is a nuclear localization 
signal, a recent study by Rihs and Peters [124] hints at 
greater complexity. Rihs and Peters [124] have chal- 
lenged the notion that this minimal NLS is the lone 
determinant of T antigen import in vivo. By measuring 
the kinetics of nuclear accumulation of different SV40 T 
antigen-/}-galactosidase hybrid proteins, they found that 
while the canonical NLS is sufficient to mediate nuclear 
import over very long time periods (15 h), hybrids 
bearing the same sequence plus residues normally im- 
mediately N-terminal to the minimal NLS are imported 
at a markedly more rapid rate (within 8 to 10 min). This 
extra portion of the T antigen, residues 111-125 (SSD- 
DEATADSQHSTP), is not basic and shows no ho- 
mology to the minimal signal. This region may function 
either as a second independent NLS, or merely to 



incr^e the efficiency i lhe minimal N^S. Interest- 
ingly, this region contain&. vcra i amino acids known to 
be phosphorylated, eithe^ the cytoplasm or in the 
nucleus. Although it remah to be seen if phosphoryla- 
tion indeed plays a role hei these observations invite 
speculation about the possib ro i e c f phosphorylation 
in regulating nuclear import, ic j should foster a wide 
range of experiments with this ld ot her proteins. Also, 
mutations in other regions of S W large T antigen can, 
curiously, inhibit nuclear impor* 32,151], 

Although nucleoplasms washe first protein with 
which it was demonstrated th a i protein domain 
mediates nuclear import [33], idenfication of the actual 
NLS proved more exacting. By hnotogy, four poten- 
tial nuclear targeting sequences are» r esent in the locali- 
zation-promoting C-terminal tail Phucleoplasmin. De- 
letion analysis revealed that the s^ence responsible 
for targeting nucleoplasm^ to the ndeus is composed 
of two of these sequences, one (RKATK) similar to 
the N-terminal NLS of MATa2 a* the other (AK- 
KKK) to the minimal NLS of SV4 large T antigen 
[16,32]. Whether this combination of le MATa2 and T 
antigen motifs constitutes one compsx signal or two 
independent juxtaposed signals rendns to be de- 
termined. 

Sequences from several other prccins which have 
been shown or thought to confer nulear localization 
are listed in Table I. In many cass the indicated 
sequences cannot be taken too literallj since they often 
represent only a NLS-homologous seqtence taken from 
within a larger domain defined by diletions as being 
important in mediating localization. 

III-B. Dual signals 

A growing number of proteins ha/e been shown to 
have two independent nuclear localisation signals (Ta- 
ble I). Among ,the better characterized examples of dual 
signals are those of polyoma large T antigen [122] and 
yeast ribosomal protein L29 [145], Polyoma large T 
antigen contains two basic sequences (VSRKRPRPA 
and PKKARED) which cooperate to mediate nuclear 
entry. Mutation of either one individually impairs but 
does not eliminate the ability of polyoma virus T anti- 
gen to enter the nucleus; mutation of both results in 
only cytoplasmic T antigen. Similarly, ribosomal pro- 
tein L29 has two basic SV40 T antigen-like signals.. 
Either signal alone is sufficient to direct 0-galactosidase 
to the yeast nucleus. Mutation of the C-terminal signal 
(KHRKHPG) has no evident effect on L29 localization; 
mutation of the N-terminal signal (KTRKHRG) par- 
tially impairs localization, and a combination of two 
mutant signals leads to an even greater loss of localiza- 
tion activity. These two signals are identical in five 
positions. 

Why should a protein have two signals? Either the 



signals are different and nave distinv* ■ ^rhaps inter- 
dependent functions {{ike an arm jjp 'eg), or th' 
signals are fenctionally equivalent sr. Mi live (J* e 
two legs). (The general assumption is ttwv- ngnaJ' a *"c 
equivalent ami additive (see section III L -ve*cr, in 
most cases of den tified dual signals, the £** ; ia * s are 
not sufficient^ characterized to make firm ^ements 
about relative unction 

In at least <ne case of a» dual signal, thetwo signals 
have been proposed to be functionally d^ nct [63]. A 
second, internd signal has been identified in MATa2. 
This internal ELS shares no homology with either the 
previously idejtified N-terminal signal of MATa2 or 
other identifid NLSs (Table I). It ies within the ho- 
mebdomain ofMATa2, suggesting -^at other homeodo- 
main-containiig proteins may bed* homologous NLSs, 
perhaps evolvd to coordinate nulear import with DNA 
binding. The fnding that MAT*2 has two non-hcnolo- 
gous, signals ombined with thf finding that mutation of 
each signal individually has dffferent effects, has led to 
a model in rfiich eaeh signal supports a different step 
in nuclear inport [63], According to this model, the 
N-terminal fignal could mediate a receptor-dependent 
association w tlx the nuclear pore complex and the inter- 
nal signal could subsequently engage the transport ma- 
chinery to foster translocation. This model provides a 
molecular basis for the fcnown two steps in nuclear 
import, bhding to the nuclear pore and subsequent 
ATP-depeident translocation through the pore [105, 
121]. Bythis model, many proteins might bear two 
signals, cr those that have only one signal might be 
imported more slowly. 

III-C Context and flanking sequence affect NLS function 

In the studies elucidating nuclear import of the SV40 
large T antigen, Kalderon et al. [75] observed that 
deletions to either side of the minimal NLS do not 
markedly affect the steady state of import. These dele- 
tions necessarily introduce new flanking sequences about 
the NLS. In this case, NLS function is not dramatically 
sensitive to the nature of abutting sequences. In later 
work, Roberts et al. [126] specifically addressed the role 
of sequence context by introducing the SV40 NLS into 
several different positions within the cytoplasmic en- 
zyme pyruvate kinase. Among five constructs bearing 
this signal at different locations, only one insertion was 
unable to direct nuclear import. By extrapolating from 
the known three dimensional crystal structure of cat 
muscle pyruvate kinase to the chicken enzyme, Roberts 
et al. [126] found that four of the fusion proteins have 
their NLS on exposed surfaces whereas the sole non- 
functional position corresponds to a buried hydro- 
phobic domain in which the NLS may be masked and 
thereby inactivated. 

In one other case of context effects on NLS function. 



• Nelson and Silver [104] fused the GAL4 and ^40 T 
antigen NLSs to both invertase and >9-galacto«*ase. In 
this case, both signals drive more efficient Alport of 
invertase than of ^-gaJactosidase They coi^tade that 
context affects the presentation of both nucMr localiza- 
tion signals. However, this conclusion is>empered by 
the possibility that other features are Iitfy to render 
some proteins less facile at nuclear impo/ than others. 
In this case, for example, £-galactosidaj*is a tetramer, 
and one might expect a dimer of invert a* (approx. 120 
kDa) to win the import race when comjiting against a 
£-galactosidase tetramer weighing in it apptox. 450 
kDa. i; V 

If I'D. Weak signals are additive 

A single NLS can clearly target a carrier protein to 
the nucleoplasm. Why then do not ai nuclear proteins 
simply carry one signal? One consideration is that knc6 
NLSs are part of the mature protin, their sequence 
must be compatible with the ovestll function of the 
protein. To satisfy this, oifc might iaagine that different 
signals have evolved to relax tW constraint that all 
nuclear proteins bear one highly onserved signal. Sec- 
ondly, it might be*unnecessary fori protein to have one 
strong targeting signal if several weaker ones can act 
together (see section III-B). 

Roberts et ai. [126] Explicitly fcsted the notion that 
reiterations of a partially defective NLS could be suffi- 
cient to drive import. The weakly active K128R variant 
of the SV40 T antigen NLS was fused once, twice, or 
thrice to pyruvate kinase. One signal yielded equal 
amounts of nuclear and cytoplasmic localization, two 
produced greater nuclear localization than cytoplasmic, 
and three proved sufficient to give exclusively nuclear 
localization. In further support of this, Lanford et al. 
[84] found that the inability of jsptides bearing a defec- 
tive SV40 T antigen signal to larget carrier mouse IgG 
to the nucleus was mitigated waen conjugates contained 
more peptides. Moreover, when Dworetzky et al. [35] 
studied import of colloidal gpid particles decorated with 
BSA coupled to peptides bemng the wild type SV40 T 
antigen NLS, they observed that as the number of 
cross-linked peptides increased, the relative uptake of 
particles and the functional size of the pore channel also 
increased There are also other examples in which reiter- 
ation of a NLS leads to faster more pronounced nuclear 
import [33,44,85]. Clearlj, these findings support a 
model in which the effects of partially functional signals 
are additive. Operationally, this meaxis that a given 
protein may have no single strong consensus match to 
any other NLS (such as the classical SV40 signal), but 
rather several poor matches that act additively to pro- 
mote nuclear import. Thus it is impossible to assign a 
priori a role to any single sequence. Additionally, in 
these cases it should prove impossible to find by ho- 



r >o$ NLS r sequence, and difficult ^p&imeiKtaffj 'to* 
C ififhich parts of the protein are functional. These 
mayithe reasons that all known studied examples of 
nucSniported proteins have at most two signals. For 
any jein with more than two signals, it may simply 
provj have proved intractable to define the regions 
aciivf import. \ ¥ 

IHrEampetition between nuclear and non-nuclear /o- 
calizax signals 

In ! number of either natural or artificial cases, 
proteihave different signals for targeting to different 
organs. What happens when a protein bears a NLS 
and (^cond signal for targeting to another cellular 
component? 

^Mikondrial vs. nuclear localization signal Cyto- 
chrome is a mitochondrial protein that bears a typi- 
cal N-rsninal mitochondrial targeting signal. When the 
SV40 antigen NLS is fused to the N-terminus of 
cytoch>me c u th© hybrid protein is directed instead td 
the nueus, based on immunofluorescence localization 
and les of cytochrome c, function [131]. In contrast, 
when ae yeasi nuclear protein HAP2 is fused to the 
C-termnus of the mitochondrial protein HEM15, this 
chirroia is focalized to mitochondria (M Haldi and L. 
Guaraite, pers. comm.). These findings suggest that 
whei/a protein carries both mitochondrial and nuclear 
locaization information, the most N-terminal signal 
doninates. Because mitochondrial import could be co- 
traisiational, proteins bearing the imt^hp^ 
sequence at the N-teraiinus may ini dige? plt^npHdnM 
import before the N15 has been transl|i^i !6he mi^i| 
also expect that-* H&scsfii NLS could tf£ febb^ztd.b| 
a signal binding protein and that nuclear imjFKDrt wouW 
begin prior to translation of a more C- terminal triito^ 
chondrial signal. 

The yeast TRM1 protein, an enzyme that modifies 
both mitochondrial and cytoplasmic tRNAs, is located 
in boih the nucleus and mitochondria. The N-terminus 
of TRM1 (residues 1-48 or 17-48) is sufficient to target 
cytochrome oxidase subunit IV or dihydrofolate re- 
ductase to mitochondria [36]. In contrast, a longer re- 
gion (1-213) targets ^-galactosjdase to both nuclei and 
mitochondria [91]. Shorter fusions to £-galactosidase 
(1-70 or 17-70) are not nuclear localized and surpris- 
ingly, cause the loss of mitochondrial DNA, rendering it 
problematic to show that the protein colocalizes with 
D API-stained mitochondria. The present model is that 
a N-terminal mitochondrial signal and a NLS, that lies 
somewhere between residues 70-213, are both func- 
tional in targeting to their respective destination [91]. 
Thus, in at least this case, one protein can be targeted to 
two different locations by two independent signals. 

Secretory vs. nuclear localization signal. In three ex- 
amples where a protein bears both a secretory signal 
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sequence and a nuclear targeting signal, an influenza 
virus hemaglutinin-SV40 large T antigen hybrid [133], 
the 78 kDa glucose regulated protein [102], and 
platelet-derived growth factor (PDGF) [87,94a] the pro- 
tein enterr the ER rather than the nucleus. In all cases, 
the export signal sequence is N-terminal and precedes 
the NLS or the presumed NLS. Thus, as in the case 
with mitochondrial targeting, these proteins can be co- 
translationally inserted into the ER such that transfer is 
initiated prior to synthesis of the NLS. The two trans- 
port machineries would, therefore, never be directly 
competing for substrate. 

In a more complex example, the hepatitis B virus 
precore protein, the protein's N-terminal signal se- 
quence initiates ER import, but following signal pepti- 
dase cleavage translocation is aborted [112]. The 
processed protein is then released or leaks back into the 
cytoplasm. The NLS then directs nuclear import. An 
interesting possibility in this case is that a cytoplasmic 
NLS-binding protein is responsible for arresting trans- 
port into the ER. 

Membrane anchor vs. nuclear localization signal. Poly- 
oma virus middle T antigen bears a C-terminal hydro- 
phobic domain that binds the protein to the plasma 
membrane. Roberts et al. [126] inserted the SV40 large 
T antigen NLS into polyoma virus middle T antigen. 
The resulting hybrid was completely cytoplasmic. When 
the C-terminal hydrophobic domain was deleted, this 
truncated hybrid entered the nucleus, arguing against 
the possibility that the NLS was inactive in the intact 
hybrid because it was not exposed on the protein surface. 
These findings suggest that a plasma membrane hydro- 
phobic anchor can over-ride a NLS. 

III'F. Subnuclear localization 

The nucleus is arguably the largest organelle in the 
cell. While most nuclear proteins need carry an address 
that directs them only to cross the nuclear envelope, we 
expect some must carry a more detailed address that 
sends them to their ultimate intranuclear destination. 
Recent studies provide detailed views as to how pro- 
teins are targeted to the inner nuclear membrane and to 
the nucleolus. 

The structural integrity of the nucleus, as well as its 
assembly following mitosis, depends on the lamina, a 
fibrillar network composed of the nuclear lamins [49]. 
The function of the lamins requires their intimate as- 
sociation with the inner membrane, where they form a 
supporting scaffold. Lamins A and B both have a 
CAAX sequence motif at their C-termini which leads to 
a series of processing events that endow the lamins with 
hydrophobic membrane anchors or targeting signals 
[37,70,81,149]. The cysteine of the CAAX box is 
farnesylated, the three C-terminal residues are proteo- 
lyzed, and the C-terminus is carboxymethylated [54,57]. 



These modifications are thought to direct the proteins 
to the inner membrane. It is not yet clear if CAAX 
processing occurs in the cytoplasm, although this seems 
likely since plasma membrane associated (RAS; y-sub- 
units of G-proteins) and secreted proteins (fungal mat- 
ing factors) are also subject to CAAX box processing. 
Once bound to the inner nuclear membrane, lamin A 
undergoes further proteolytic processing that removes 
2-3 kDa from the C-terminus, including the CAAX 
modifications [149], and lamin B associates with a 
specific lamin B receptor which could potentially recog- 
nize the farnesylated C-terminus [153]. Thus, in both 
cases the CAAX modifications may, in the end, not 
function as hydrophobic membrane anchors, but might 
only direct the lamins to the membrane where lamin- 
lamin or lamin-receptor interactions bind the lamins to 
the inner membrane. 

Recent studies have revealed that several retroviral 
regulatory proteins. Rex of HTLV-1, Rev of HIV, and 
Tat of HIV, residue at the nucleolus [20,25a,82,l 39,140]. 
Many of the techniques originally used to study nuclear 
localization have been employed to address the question 
of how proteins are sent to the nucleolus. Point muta- 
tions and deletions were first used to implicate specific 
sequences in nuclear/ nucleolar targeting. Subsequently, 
fusions were constructed to jS-galactosidase, pyruvate 
kinase or, in one case, to another retroviral protein, 
P40 x , that is already nuclear [25a,140]. The view that 
has emerged is simple and elegant. Tat, Rex, and Rev 
all carry a basic nuclear targeting signal. For Rex 
(PKTRRRP), this sequence is somewhat homologous to 
the SV40 NLS (PKKKRKV), while those of Tat 
(GRKKR) and Rev (RRNRRRRW) are less similar. 
For each of these proteins, an additional flanking se- 
quence appears to modify the address from nuclear to 
nucleolar. As few as 12 (Tat), 16 (Rev), or 20 (Rex) 
amino acids can serve as a nucleolar localization se- 
quence (NuLS), i.e., can direct a passenger protein from 
the cytoplasm all the way to the nucleolus. Point muta- 
tions within these short sequences have further delin- 
eated the NuLS. Based on only these three proteins, a 
rough consensus of the additional sequence, or 'subsig- 
nal\ which modifies a NLS to convert it into a NuLS is 
a glutamirie residue flanked on both sides by several 
basic residues, usually arginines, i.e., RRRQRRR. This 
sequence is adjacent to or even overlapping the NLS* 
which it modifies. Tat (GRKKRRQRRR) and Rex 
(PKTRRRPRRSQRKR) have one copy of the subsig- 
nal, in both cases located on the C-terminal side of the 
NLS. Rev (RQARRNRRRRWRERQR) appears to have 
two less conserved subsignals, one on each side of the 
NLS. Perhaps in this case, the two subsignals are weak 
individually and therefore two are required to properly 
modify the NLS. The finding that nucleolar localization 
signals are modified NLSs raises interesting questions 
as to their mechanism of action. Is the NuLS simply an 



92 



NLS in which the interaction with a binding protein is 
altered such that the binding protein is not disengaged 
until the nucleolus has been reached? Alternatively, 
there could be a second receptor, either in the cyto- 
plasm or the nucleus, which recognizes the NuLS and 
which could functionally interact with the NLS binding 
protein. Since the NuLS is a modified NLS, it seems 
unlikely that it acts simply by binding to a nucleolar 
component after random diffusion into the nucleolus. 

A number of other proteins have other specific sub- 
nuclear locations. In yeast, for example, the tRNA 
modifying enzyme TRM1 (N\ A^ 2 -dimethyl guanosine 
tRNA methyltransferase) [91] and RNA11 [17], a com- 
ponent of the splicesome required for mRNA splicing, 
are located at the nuclear periphery, within 300 nm of 
the nuclear envelope. tRNA ligase is found at two 
subnucleax locations, the extreme periphery of the 
nucleus at what appear to be nuclear pores and at spots 
more centrally located, approximately 100 nm from the 
envelope [19]. Clark and Abelson [19] suggest that the 
ligase associates with nascent tRNA's in the more central 
spots and then migrates, as an RNA-enzyme complex, 
to the pore where the entire processing machinery is 
located. In amphibian and insect oocytes, snRNPs are 
found in a newly described intranuclear structure called 
the sphere organelle [47], Sphere organelles, of which 
there are a few dozen per nucleus, are located either 
extrachromosomally or in association with specific 
chromosomal loci called the sphere organizers, by anal- 
ogy with nucleoli and the nucleolar organizing center. 
Gall and Callan [47] suggest that the spheres assemble 
snRNPs in much the same way that nucleoli build 
ribosomes. In somatic cells, snRNPs are found, by 
indirect immunofluorescence and 3D reconstruction, in 
20 to 50 irregularly shaped speckles connected by a 
reticulum which forms a network extending throughout 
the nucleoplasm [46,90,142]. How any of the above 
proteins or particles find their specific subnuclear loca- 
tions, if not by random diffusion, is unknown. 

IV. Cellular components 

IV-A. Nuclear pore complex (NPC) 

Proteins enter [35,40] and RNA exits [34] the nucleus 
through the nuclear pore complex. A NPC not only 
supports the simultaneous [34], bidirectional flow of 
these two macromolecules but must do so under very 
heavy traffic conditions in a rapidly growing cell. The 
imposing structure of the NPC is, thus, not so surpris- 
ing. The structure of the NPC, from amphibian oocytes, 
has been determined by high resolution electron mi- 
croscopy combined with image processing [2,119,146]. 
The NPC is composed of two coaxial rings connected to 
a central plug by spokes. The two rings are on the 
surfaces of the nuclear envelope, the so-called outer ring 



on the cytoplasmic surface and the inner ring on the 
nucleoplasm^ surface. As viewed down the central axis 
of the rings, perpendicular to the plane of the nuclear 
envelope, the NPC has eightfold symmetry highlighted 
by eight spokes which extend from the outer edge of the 
pore complex to a central structure. This central struc- 
ture is variously referred to as the central plug, central 
granule or transporter. As implied by its name, the 
transporter is thought to contain the transport channel 
(see section IV-C). The NPC has an overall diameter of 
approx. 140 nm; the rings have an inner diameter of 
approx. 80 nm and the central plug has a diameter of 
approx. 35 nm. The density of NPCs in the nuclear 
envelope is usually proportional to the actual or, as in 
the case of the Xenopus oocyte, the anticipated meta- 
bolic activity of the cell [98]; the metaboiieally inactive 
lymphocyte has approx. 3 pores//tm 2 , whereas the 
Xenopus oocyte has approx. 50 pores/fim 2 . 

The structural complexity of the NPC is often under- 
appreciated. Mass determination of the NPC by high 
resolution scanning transmission electron microscopy 
has yielded a molecular mass of approx. 124 MDa [119], 
approx. 40-times the mass of a ribosome. The NPC 
could thus easily consist of over 100 different poly- 
peptide species. Despite the large size and structural 
complexity of the NPC, only a handful of <x>nstituent 
proteins have been identified [22,27,28,50,141,154]. 
Many, if not all, of these proteins are glycoproteins 
containing O-linked A^-acetylglucosamine (GlcNac) re- 
sidues (for review see Ref. 68). Whether any of these 
proteins are actually part of the import machinery re- 
mains to be determined. 

In addition to actively transporting proteins and 
RNA, the NPC also allows passive diffusion of small 
molecules, including proteins, across the nuclear en- 
velope (113, for review see Ref. 11). The exclusion size 
limit for diffusion of a globular protein is approx. 60 
kDa, although diffusion becomes rate-limited at approx. 
20 kDa. The physiological significance of this passive 
diffusion is not known. The site of the passive diffusion 
channel in the NPC and its relationship to the active 
transport channel are also not known. 

IV-B. Receptor (NLS-binding proteins) 

The existence of receptors for nuclear targeting sig- 
nals has been inferred from the specificity and satura- 
bility of nuclear protein import [61 J. While the data on 
saturability of nuclear protein import are conflicting, 
the evidence for specificity is clear cut. 

Nuclear import is a highly specific process. Under 
physiological conditions, the nucleus has a very differ- 
ent protein composition compared to the cytoplasm. 
This is accomplished despite the mixing of cytoplasmic 
and nuclear contents that takes place during each mitotic 
cycle in higher eukaryotes, when the nuclear envelope 
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breaks down at the beginning of prophase. Early experi- 
ment*; in which labeled proteins were injected into frog 
oocytes showed that the nucleus readily discriminates 
between nuclear and cytoplasmic proteins [10]. The 
subsequent d ; scovery of nuclear localization signals, in 
which single amino acid substitutions impair function 
[75,83], showed that a specific sequence in a nuclear 
protein is the basis of this discrimination. The conclu- 
sion from these results is that specific recognition mech- 
anisms are involved in nuclear protein localization, im- 
plying the existence of receptors for nuclear targeting 
signals. 

Besides specificity, the other hallmark of receptor- 
mediated processes is saturability. Nuclei seem to have 
a large capacity to accumulate nuclear proteins; there- 
fore, one would not expect to reduce the final extent of 
accumulation of a given protein by adding an excess of 
another, even if they use the same receptors. On the 
other hand, the rate of accumulation should be reduced 
by an excess of competitor protein or peptide if there is 
a limited number of common receptors. Analysis of the 
localization of SV40 T antigen signal peptide-BSA con- 
jugates injected into Xenopus oocytes has provided evi- 
dence that the rate of nuclear import is saturable. These 
conjugates are imported into nuclei with saturable 
kinetics (approximate K m of 1.8 /iM and a V miiX of 
6.4 • 10 9 molecules/cell min) [56]. Furthermore, the im- 
port rate of the conjugates is reduced by millimolar 
concentrations of free wild type signal peptide but not 
by a mutant (import deficient) signal peptide. In a rat 
cell free system [96], a 50-fold excess of cold T antigen 
reduces the nuclear accumulation of labeled T antigen, 
although 10 mM free signal peptide, surprisingly, had 
no effect. 

The existence of a limited number of saturable bind- 
ing sites has been inferred from studies with a mutated 
SV40 large-T antigen that is cytoplasmic despite having 
a wild-type nuclear targeting signaL This protein inter- 
feres with the nuclear localization of wild-type T anti- 
gen and other nuclear proteins [132], The interference is 
eliminated when the nuclear targeting signal of the 
cytoplasmic mutant is inactivated. This can be interpre- 
ted as evidence for a saturable number of receptor sites; 
the cytoplasmic mutant presumably binds an import 
receptor which, because the mutant is not translocated 
into the nucleus, remains occupied and unavailable for 
other proteins. This SV40 large-T antigen mutant does 
not interfere with the nuclear localization of all nuclear 
proteins tested [132], suggesting that the observed inter- 
ference does not involve a binding site common to all 
nuclear proteins. Similar findings and conclusions are 
reported in a study on a mutant ICP4 protein of herpes 
simplex virus, A mutation in ICP4 inhibits entry of this 
and other, but not all, viral nuclear proteins [79]. Ad- 
ditional evidence for the involvement of a saturable 
binding site prior to translocation has come from 



analyzing the localization of small nuclear and non- 
nuclear proteins injected into the cytoplasm of cultured 
cells [15], The nuclear protein histone HI and SV40 T 
antigen signal-cytochrome C conjugates are retained in 
the cytoplasm at 4°C, whereas non-nuclear proteins of 
similar size diffuse into the nucleus. The cytoplasmic 
retention of histone HI is overcome by injection of 
excess HI. This saturable cytoplasmic retention suggests 
that there are cytoplasmic receptors for nuclear localiza- 
tion signals that remain * frozen* in place at low temper- 
ature. The excess HI is presumably free to enter the 
nucleus by diffusion. 

Although the above results indicate the involvement 
of a saturable signal recognition step, conflicting data 
on the saturability of nuclear import have been ob- 
tained. Others have found, in experiments involving 
injection into cultured cells, that millimolar concentra- 
tions of signal peptides do not reduce the rate of import 
of nanomolar concentration of SV40 signal-peptide- 
phycoerythrin conjugates [152]. The difficulty in obtain- 
ing coherent results from experiments measuring import 
kinetics probably stems from the technical complexity 
of such experiments. 

The lack of a consensus sequence among the nuclear 
targeting signals identified so far (Table I) makes specific 
recognition by the translocation machinery difficult to 
explain. Three hypothetical mechanisms can be envi- 
sioned. First, there could be a receptor for each signal 
sequence. This would require a large number of differ- 
ent receptors on a limited number of pore complexes. 
Second, there could be a single receptor with broad 
specificity. This is hard to reconcile with the observed 
high specificity of nuclear protein localization, unless 
that which is recognized is not the primary structure of 
the signals but some other common determinant, e.g., a 
secondary structure motif. Third, there could be adap- 
tor molecules in the cytoplasm that bind individual 
signals and bring them to a common receptor on the 
nucleus. The receptor would then recognize a determi- 
nant present only in the complex between the nuclear 
protein and the adaptor molecule. Of course other inter- 
mediate possibilities exist and there is little evidence to 
support any model at this stage. But the data collected 
so far tend to support the adaptor model. For example, 
signal-binding proteins have been described to be pre- 
sent in the cytoplasm, nuclear envelope and nucleo- 
plasm (see section IV-B below), and it is known that 
colloidal gold particles coated with different nuclear 
proteins, T antigen and nucleoplasmin, are imported 
through the same pores, arguing against the presence of 
different receptors in different pores [35]. 

Putative receptors for nuclear localization signals 
have been detected (Table II) by two different tech- 
niques, direct visualization by electron microscopy of in 
situ binding and cell free binding studies. The probes 
used in the cell free binding studies have been either 
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signal-bearing peptides or antibodies, although a recent 
report on the inhibition of binding by NEM [106] could 
provide another valuable tool for defining binding sites. 

EM studies Evidence that nuclear proteins have dis- 
crete binding sites on the nuclbar envelope has been 
provided by electron microscopic studies on frog oocytes 
injected with nucleoplasmin-coated gold particles. Early 
experiments showed that the injected particles are not 
free to diffuse immediately before or after translocation 
through nuclear pores, but are kept instead bound to 
structural elements on both sides of the pore [40J. 
Subsequent work extended these observations by show- 
ing that nucleoplasmin-coated colloidal gold binds to 3 
nm fibril on the cytoplasmic side of the pore [121]. 
Binding to the octagonal array of spokes and the central 
plug or transporter of the pore complex (see section 
rV-A) has also been observed [3]. The role of these 
different binding sites is uncertain; they could act 
sequentially. The cytoplasmic filaments could bind the 
import substrates first, conferring vectorial movement 
toward the nuclear pore where they could be transferred 
to docking sites in the central transporter of the po*e 
complex [3]. 

Despite the apparent role for cytoskeletal element in 
nuclear protein import, disruption of the cytoskefeton 
with colcemid plus cytochalasin E does not affect the 
nuclear localization of an endogenous heat-shock pro- 
tein or an injected adenoviral protein Ela [150J. How- 
ever, the effect of this treatment on the karyoskeleton or 
on the pore-associated fibrils is unknown. 

Cell-free binding studies. A number of nuclear pore 
complex proteins have been identified with lectins or 
antibodies raised against nuclear envelopes [27,50,66, 
141]. However, we will discuss as putative import recep- 
tors only those proteins which are either recognized by 
antibodies that inhibit nuclear import, or which bind 



prevmssly ch^t^^d nuclear i&fgeting signals with 
some^pecificy- Proteins that meet these criteria have 
beedjound> n different biolc^cal systems (see Table 
II). ife the following discussion and in Table II, the 
molcular nass of the proteins identified with the aid of 
cro^link/s do not include the mass of the crosslinked 
ligmd^ nerefore direct comparisons can be made with 
the^oeins detected with antibodies. Furthermore, it is 
im/o^ant to note that the physiological relevance to 
nucleir import of the binding proteins identified so far 
renins to be determined. However, the identification 
of any proteins at all is a promising development. 

(Most studies on signal-binding proteins have been 
performed with rat hepatocytes, and chemical cross- 
Inking has generally been used to detect the association 
if peptide Uganda to putative receptors. In the first such 
study, Adam et al. [1] describe the binding pf radio- 
labeled peptides bearing the SV40 large-T antigen signal 
to two proteins of approx. 56 and 66 kDa in rat liver 
homogenates. The two proteins were found in the cyto- 
plasm, the nuclear envelope and the nucleoplasm. Inter- 
estingly, the binding activity of the nuclear envelopes 
had to be solubilized with detergent and high salt for 
optimal detection; Adam et al. (1) speculate that this 
might be because the proteins are associated with en- 
dogenous ligands. The binding did not require ATP and 
was competed by unlabeled peptides with the wild-type 
sequence but not by mutant peptides unable to direct 
nuclear targeting, indicating that the binding is specific. 
Binding of the signal peptide to p56 was analyzed in 
more detail and was found to be saturable at about 80 
nM peptide, regardless of the original cellular location 
of the p56. The different cellular locations of both p56 
and p66 suggests that these proteins serve as carriers of 
nuclear proteins from cytoplasm into the nucleoplasm. 
In a similar study with rat liver cells, Yamasaki et al. 
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[157] used peptides bearing the nuclear argeting signals 
of five different nuclear proteins as ligads [157]. It was 
found that each peptide has a unique pattern of affini- 
ties toward Four different binding pi^eins. Two of the 
proteins (58 and 98 kDa) are cytoplamic and the other 
two (53 and 138 kDa) are loosely nulear, as shown by 
their extractabffity with salt or nuclese digestion. There 
was no unique binding protein foraiy individual signal, 
but the relative affinities of each pptide for the four 
proteins differed. However, in jeieral there is little 
correlation between the binding affinities of signal 
peptides toward specific proteins; as measured in this 
study, and the ability of these sigids to target proteins 
to the nucleus in vivo or in vitn.In the case of two 
peptides derived from the nuclewlasmin signal, one 
with a complete targeting signal hii little, if any, affin- 
ity for the binding proteins, but it efficiently targets IgG 
conjugates to the nucleus in- a nicroinjection assay 
using rat cells [84]. Conversely; a signal peptide with a 
truncated nucleoplasmin signal band efficiently to the 
binding proteins, but was much les efficient at target- 
ing conjugates. A nuclear targetingiignal from the yeast 
protein MAT<*2, which is capdble of -targeting fi- 
galactosidase to the yeast nucleus in vivo [64] but not 
conjugated proteins upon injectionjhto mammalian cells 
[18,84], did not show significant afinity for four of the 
five binding proteins. A peptide tearing a mutant SV40 
T antigen signal, which functions ^efficiently in nuclear 
targeting in vivo and in vitro (Rcf. 84, for review see 
Ref. 125), had almost the same Affinity pattern for the 
binding proteins as the wild-type version of the same 
signal, suggesting that these Willing proteins do not 
include those described by Adamk al. [1]. Although the 
reason for the discrepancy between binding affinity and 
localization ability is not clear, Yamasaki et al. [157] 
emphasize that the conformational requirements for in 
vitro binding may differ from those required for nuclear 
import in vivo. None of these proteins bind WGA (see 
section IV-Q. 

Benditt et al. [7] have described additional signal- 
binding proteins, also from rat hepatocytes, using puri- 
fied nuclear envelopes as the starting material. The 
ligand employed in this study was a radiolabeled peptide 
containing the SV40 T antigen signal coupled to a 
photoactivatable crosslinker. The nuclear envelopes were 
extracted with 1% Triton X-100 and the extract was 
incubated with the ligand. Binding proteins present in 
the extract were covalently crosslinked to the radio- 
labeled ligand by UV irradiation. Four major protein 
species were labeled in this manner; they have apparent 
molecular masses of 56, 57, 65 and 74 kDa. A 100-fold 
excess of unlabeled ligand reduced the crosslinking with 
the labeled peptide by 60%, suggesting that crosslinking 
was a consequence of specific binding; a targeting-de- 
fective mutant peptide was not tested as a competitor. 
Two of the proteins identified in this study (56 and 65 



kDa) could correspond to the binding proteins of 56 
and 66 kDa described by Adam et s* however, it is 
unhkely that the 65 kDa protein, an - m>c protein, 
corresponds to the cytosoiic 68 kDa p described 
by Yamasaki et al. [157J fsee. Js&fe 

Under the simple rationale that the recepsot acr a 
positively charged signal has a continuous stsetch of 
negatively charged anino acids, Yoneda et al. [158] 
raised antibodies against a decapeptide containing the 
sequence DDDED. - The resulting antiserum recognizes 
nuclear envelope antigens in a variety of cultured cell 
types, as revealed by immunofluorescence studies. This 
antiserum also inhibits nuclear accumulation of 
nucleoplasmin and signal peptide-BSA conjugates when 
injected into human embryonic lung cells. In Western 
blots of rat liver nuclear lysates the antiserum recog- 
nizes proteins of 59 and 69 kDa, reminiscent of the 
proteins described by others using signal peptides as 
ligands (see above and Table II). Attempts to fractionate 
the nuclear lysate and localize the antigens more pr 
cisely have thus far been unsuccessful, even thor' a 
high salt-detergent extract of nuclear en velor" can 
adsorb 60% of the import-inhibiting activity ^ sent in 
the antiserum. For unknown reasons the/** 61115 im " 
munoprecipitated from such an extract */ n0t the sarn ^ 
as those detected by Western blots, bu^ dlffercnl set of 
proteins with apparent molecular p^es of 34, 43, 50, 
54 and 65 kDa. How these pr £eins are related l ?: 
nuclear import is not known, V 1 sincc the * recb ^ 
nized by an antibody that in^ 6its import, they could be 
components of the impor^PP aratus - >* 

Obviously the search ot receptors involved in nuclear $ 
protein import is on ?*kotc secure basis when coupled } : 
v/ith a functional a5» av > I« this regard, frog oocytes, are 
a valuable loo) Aesause they greatly facilitate micro- 
injection of import substrates and reagents and the 
separation of nucleus and cytosol. Using this system, 
Featherstone et al. [38] have found that a microinjected 
monoclonal antibody raised against rat liver nuclear 
envelopes inhibits nucleoplasmin import and RNA ex- 
port. The passive diffusion of non-nuclear proteins is 
not affected, indicating that only active import is in- 
hibited. When used as an immunofluorescence reagent, 
the antibody decorates the oocyte nuclear envelope but, 
surprisingly, not the nuclei of the surrounding follicle 
cells. This suggests that the antigen may be present only 
in some cell types or only under certain physiological 
conditions. Western blot analysis of total oocyte nuclear 
proteins reveals antibody-binding components of 60 
kDa and 180 kDa. These proteins are therefore candi- 
dates for components of the import apparatus in the 
oocyte nucleus. It should be noted that the antibody 
used in these studies does not recognize the nuclear 
targeting signal-binding proteins identified in rat liver 
cells by the same group [1]. 

HeLa cell nuclear proteins have also been probed 
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with nuclear targeting signal-bearing peptides. In this 
case the ligand was labeled not in the peptide moiety 
but in an attached photoactivatable, cleavable cross- 
linker. This technique allows the transfer of the label 
from the ligand to the receptor upon cleavage of the 
crosslinker. Using this approach, Li and Thomas [92] 
found that the SV40 T antigen signal peptide is capable 
of transferring the label to an acidic (pi 6) protein of 66 
kDa when incubated with the supernatant of sonicated 
nuclei, indicating that the signal binding protein is a 
soluble nuclear protein. An import-deficient mutant 
peptide fails to transfer the labeled crosslinker. To test 
the correlation between binding and import function an 
import assay based on lysolecithin-permeabilized cells 
was developed. Proteins that accumulate in the nucleus 
are capable of transferring the crosslinker to p66, 
whereas non-nuclear proteins do not transfer the cross- 
linker. In the course of these investigations, Li and 
Thomas [92] observed that Staphylococcus aureus pro- 
tein A is an avidly karyophilic protein that also associ- 
ates with p66» as assayed with the label transfer tech- 
nique. Peptides with the SV40 T antigen signal compete 
with the transfer of label from protein A to p66, even 
though protein A does not have an SV40-like nuclear 
targeting sequence. This indicates that p66 recognizes at 
least two signals with different primary structure. This 
is also the only case in which a nuclear targeting signal- 
binding protein apparently coincides with a major spot 
in a Coomassie-stained get. The rest of the described 
binding proteins are minor cellular components. 

Yeast nuclear proteins separated by polyacrylamide 
gel electrophoresis and transferred to nitrocellulose 
membranes contain proteins of 59 and 70 kDa that bind 
a fragment of the yeast nuclear protein GAL4 and a 
SV40 T antigen signal peptide-HSA conjugate [138]. A 
nucleoplasmin signal peptide-HSA conjugate binds the 
same two proteins and two additional proteins of 95 
and 140 kDa. Conjugates with a mutated SV40 T anti- 
gen signal do not bind. All conjugates compete with 
each other and with a H2B conjugate for binding, 
indicating that the two proteins are capable of recogniz- 
ing nuclear targeting signals with different sequences. 
This lends support to the notion that nuclear receptors 
are not specific for the primary structure of the signal. 
These signal-binding proteins can be extracted by 0.5 M 
NaCl but not by 2% Triton X-100, indicating that they 
are held in the nucleus by ionic interactions. 

Lee and Melese [88] have found a signal-binding 
protein of 67 kDa in yeast nuclear envelopes. In this 
case, the ligands were HSA conjugated to either the 
wild-type SV40 T antigen or the H2B nuclear targeting 
signal. Conjugates carrying import-deficient mutant 
peptides did not bind. Since the protein is only extracta- 
ble with 8 M urea or 2% Triton X-100/2 M KC1, it 
appears to be part of a karyoskeletal structure, like the 
nuclear pore complex or the putative yeast nuclear 



lamina. This protein could correspond to the 70 kDa 
yeast protein described by Silver et al. [138]. 

Cellular location of NLS-binding proteins. The cellular 
location of the putative receptors for nuclear targeting 
signals is still an open question. Immunofluorescence 
studies with import-inhibiting antibodies place them at 
the nuclear periphery [38,158]. Electron microscopy 
using colloidal gold particles, coated with nuclear pro- 
teins or signal peptide conjugates, detects them on the 
pore complex [3] and its associated fibrillar arrays 
[40,121]* However, some of the biochemical fractiona- 
tion studies described above [1,157] and other studies 
.. [15,106] suggest that the receptor is cytoplasmic. Small 
numbers of receptors dispersed throughout the cyto- 
plasm would probably not be detected by microscopic 
techniques, thus there is not necessarily a contradiction 
between the studies whose findings on the cellular loca- 
tion of a receptor are discrepant. A unifying model is 
that a receptor binds a nuclear protein in the cytoplasm 
and delivers it to the nuclear periphery for transport 
into the nucleus. 

Unambiguous localization of nuclear protein binding 
sites will probably require a combination of different 
experimental approaches, since both microscopic and 
cellular fractionation studies are subject to artifacts. 
Preservation of native structures is always a concern 
when preparing samples for microscopic examination, 
and in practice cellular fractionation protocols do not 
always achieve clean separations. For example, it is 
difficult to distinguish peripheral membrane proteins 
from cytosolic proteins, or to define the degree of 
extractability that separates integral membrane and 
karyoskeletal proteins. 

IV-C. Transporter 

The nuclear protein translocation machinery, or 
transporter, is part of the nuclear pore complex (see 
section IV-A), transiently associates with its protein 
substrates, uses ATP and is inhibited by WGA. Other- 
wise, the structure and function of the transporter re- 
main largely unknown. 

WGA has been the most useful tool so far in analyz- 
ing the transporter, although it may not be generally 
applicable since it does not inhibit nuclear import in 
some in vitro systems [48,77, 114]. WGA binds with 
highest affinity to di- and oligo-^-acetylglucosamine 
residues. Unlike other forms of glycosylation, O-glyco- 
sylation of proteins with ^-acetylglucosamine is re- 
stricted to the cytosolic and nuclear compartments of 
the cell (for review see Ref. 68). There are several 
nuclear proteins that are O-glycosylated and bind WGA 
[27,28,66,69,159], but their role in protein import re- 
mains to be demonstrated. Evidence in favor of such a 
role is that a monoclonal anti-pore antibody, one of 
many which include A^-acetylglucosamine as part of 
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their epitopes, inhibits nuclear import in Xenopus 
x>cytes [?R]. Also, WGA [24,43,134,152,159] and a lectin 
from Datura stamonium with the same specificity [159] 
nhibit nuclear import, while other lectins with different 
iugar specificfties do not [43,159]. Inhibition by WGA 
s concentration dependent [24,159] and is reversed upon 
iddition of excess /^-acetyl glucosamine [43,159]. Bind- 
ng of this lectin to the nucleus does not affect the 
unctional pore radius for free diffusion [24,74,159] or 
he mobility of nucleoplasmin in the cytoplasm [24]. 
VGA and nuclear proteins do not compete for binding 

0 the nucleus [105], even though nucleoplasmin- and 
VGA-coated gold particles appear to bind to the same 
tore structures [3]. Thus, it is generally accepted that 
his lectin specifically blocks translocation without af- 
ecting protein binding or passive diffusion through the 
ore. 

That WGA-binding NPC proteins play a role in 
uclear protein import has been elegantly demonstrated 

1 a recent report [42]. Import-competent nuclear en- 
elopes assemble de novo from Xenopus egg extracts, 
/hen the extracts are depleted of WGA-binding pro- 
Hns (major species of 60, 97 and 200 kDa) prior to 
uclear assembly, the newly formed nuclear pores have 
ormal morphology and the proper exclusion properties 
s measured by dextran diffusion, but are deficient for 
inding and import of signal peptide-HSA conjugates, 
iddition of the missing proteins, even from heterolo- 
ous sources, restores binding and import. The finding 
i this report [42] that nuclei without WGA-binding 
roteins are not even able to bind an import substrate is 
jrprising in light of the previous finding by the same 
roup that WGA does not affect binding [105] but only 
lbsequent translocation. To account for this apparent 
iscrepancy, the authors offer the hypothesis that glyco- 
sylated receptors in intact nuclei are not accessible to 
fGA, or that glycosylated proteins are essential for the 
ssembly of the receptors into the pore complex. In any 
ise, this experimental system should be a powerful tool 
) identify components of the translocation apparatus. 

Nuclear localization of proteins naturally present in 
oth the nucleus and cytoplasm has recently been re- 
orted to be less affected by WGA than that of exclu- 
vely nuclear proteins [24). If confirmed, this observa- 
on could uncover a second pathway for nuclear pro- 
sin import. 

Nuclear protein import is perhaps the only case of 
rotein transport in which the physical location of the 
anslocation machinery is known. EM studies have 
lown that import substrates are translocated through 
le nuclear pore complex [35,40,105,121]. Furthermore, 

high resolution analysis of EM images has resolved 
ie binding of colloidal gold coated with nucleoplasmin 
nd WGA to discrete elements of the pore complex [3], 
he element of the NPC that appears to mediate the 
ctual translocation across the nuclear envelope is the 



central plug or now-called transporter. This is a 
doughnut shaped structure with an overall diameter of 
approx. 35 nm, surrounded by a ring of electon-trans- 
parent material (possibly the passive diffusion channel). 
Nucleoplasmin-coated particles on their way into the 
nucleus appear to first bind around the center of the 
transporter in an octagonally symmetrical pattern, and 
then bind to the exact center of the transporter where 
there appears to be a gated channel [3]. Current models 
on. how translocation takes place envision the central 
transporter of the pore complex as a diaphragm. The 
opening of this diaphragm presumably depends on a 
signal recognition even and ATP hydrolysis. The func- 
tional diameter of the channel depends, at least partly, 
on the strength of the interaction between the signal 
sequence and the receptor site, since it is affected by the 
type and number of signal sequences present on the 
substrate [35,85]. Unlike protein import into other 
organelles, translocation across the nuclear envelope 
does not require structural flexibility in the import 
substrates, since the pore can accommodate rigid gold 
particles of up to 26 nm in diameter [35] (the estimated 
functional diameter of the free diffusion channel is 
approx. 9 nm [113]. WGA and import-inhibiting anti- 
bodies which recognize O-glycosylated NPC proteins 
could inhibit translocation by crosslinking components 
of the central transporter and thereby restricting the 
opening of the channel [3]. 

V. Translocation mechanism 

Nuclear import is clearly an active process, requiring 
both physiological temperature and ATP. At low tem- 
perature or in the absence of ATP, import substrates 
bind to the nuclear periphery but are not translocated 
across the nuclear envelope [105,121]. The requirement 
for physiological temperature has been demonstrated in 
vivo [13,121,152] and in vitro [48,73,77,96,107,114,134]. 
The ATP requirement has also been investigated in a 
number of systems. Nuclear protein import is abolished 
following depletion of ATP with apyrase in cell-free 
systems [48,77,96,105,107,108,114] or with deoxyglucose 
and sodium azide in cultured cells [121]. 

ATP is widely thought to serve as an energy source 
for import. However, a study analyzing the nucleotide 
requirements for the import of a nucleoplasmin-phyco- 
erythrin conjugate into isolated rat liver nuclei has 
reached a different conclusion [73]. This study found 
that each of four nucleotide triphosphates (ATP, GTP, 
UTP and CTP) and a,£-methyIene-ATP are almost 
equally effective in fulfilling the nucleotide require- 
ment; thus it was concluded that the role of the nucleo- 
tide is to promote an energy-independent binding step 
(not the actual translocation) of the protein conjugate. 
The strongest argument that ATP is not an energy 
source was the import-promoting capability of a,£- 
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raethylene-ATP, which was used as a non-hydrolyzable 
AT P analog. However, this conclusion is uncertain since 
this compound has a potentially hydrolyzable -/-phos- 
phate [160], Experiments using /?,Y-imido-ATP, which 
lacks s-*ch a phosphate, have shown that at least in 
isolated yeast nuclei ATP hydrolysis is required for the 
import process [48]. 

There is only one report challenging the need for 
ATP. Heparin-treated envelopes from rat liver nuclei 
accumulate histones, nucleoplasmin and polylysine, but 
not immunoglobulins, myoglobin or cytochrome C. Ac- 
cumulation was reported to be stimulated by GTP and 
GDP but not by any other nucleotide [123]. However, 
the significance of these results is unclear, since the 
histones, nucleoplasmin and polylysine accumulated in 
the absence of GTP and the stimulation by the nucleo- 
tide was only 30%. 

i 
i 

VL Regulation of nuclear protein import ; 

Under physiological conditions, nuclear protein lo- 
calization must be subject to complicated regulatory 
mechanisms since the presence of certain proteins in the 
nucleus is required only at very specific moments in the 
cell cycle or only in response to short-lived stimuli. A 
few examples of such regulation have been documented, 
and possible regulatory mechanisms have begun to 
emerge. 

One way to regulate import is through phosphoryla- 
tion, either of the imported protein itself or of a protein 
with which it interacts. A peptide sequence flanking the 
nuclear targeting signal of SV40 large-T antigen con- 
tains phosphorylation sites; removal of this sequence 
significantly reduces the import rate of T antigen [124], 
suggesting that efficient import of T antigen requires 
that it be phosphorylated (see section III-A). Nuclear 
localization of the yeast protein SWI5 is also regulated 
by phosphorylation; however, in this case, phosphoryla- 
tion of SWI5 prevents import (T. Moll and K. Nasmyth, 
pers. commun.). Localization of the mammalian tran- 
scriptional activator NF-kB is regulated by phospho- 
rylation of an interacting protein. NF-kB is initially 
located in the cytoplasm in an inactive form complexed 
with the inhibitory protein IkB [89]. As shown in a 
cell-free system, phosphorylation of IkB by cAMP-de- 
pendent protein kinase or protein kinase C releases 
NF-kB to move into the nucleus [52,134]. The dissocia- 
tion of lit B presumably unmasks a localization signal in 
NF-kB. 

Interaction with other proteins could be a general 
mechanism to keep otherwise nuclear proteins in the 
cytoplasm until their function is .required [71]. The 
regulation of nuclear protein localization by interaction 
with cytoplasmic elements has been demonstrated for 
the bovine cAMP-dependent protein kinase II and has 
been inferred for the c-rcl protein (the product of the 



cellular counterpart of the v-rel oncogene), the 
Drosophila morphogen dorsal and steroid receptors. 
cAMP-PK is a tetramer of two regulatory and two 
catalytic subunits normally associated with the Golgi 
complex. Following elevation of intracellular cAMP 
levels, the complex dissociates and the catalytic sub- 
units are imported into the nucleus, while the regulatory 
subunits remain associated with the Golgi. When the 
cAMP concentration returns to basal levels the catalytic 
subunits re-associate with their regulatory counterparts, 
indicating that nuclear import of this protein is a re- 
versible process [110]. The product of the c-rel gene is a 
cytoplasmic protein whereas the product of the v-rel 
gene is nuclear. Substitution of the carboxy terminus of 
v-rel protein with that of c-rel inhibits nuclear localiza- 
tion of the erstwhile v-rel, suggesting that the carboxy 
terminus of c-rel acts as a cytoplasmic anchor that 
prevents otherwise constitutive nuclear localization [65]. 
Similar, the Drosophila morphogen dorsal, which is 
homologous to the c-rel protein, has carboxy- terminal 
sequences that keep it in the cytoplasm, presumably by 
interactions with the products of the genes toll or cact. 
Release of the interaction or deletion of C-terminal 
sequences permits nuclear localization of the dorsal 
protein, thereby allowing it to function in determining 
cell fate along the dorsal-ventral axis [127,130,143]. 
Steroid receptors are inactive, cytoplasmic and com- 
plexed with HSP90 in the absence of ligand [80]. Upon 
receptor activation by the appropriate steroid ligand, 
HSP90 dissociates and the receptor moves into the 
nucleus. Although a role for HSP90 in keeping the 
receptor in the cytoplasm is not clear, it could either 
mask a nuclear localization signal or interfere with a 
possibly localization-promoting phosphorylation of the 
receptor. The receptor is phosphorylated coincident with 
HSP90 dissociation. 

Although in some cases nuclear proteins are anchored 
in the cytoplasm by interacting proteins, in other cases 
interaction with a second protein (other than a compo- 
nent of the localization machinery) does not interfere or 
is actually necessary for nuclear localization. An im- 
port-deficient subunit of the homopentameric Xenopus 
nucleoplasmin can be carried into the nucleus by an 
import-proficient subunit [33]. Yeast histone H2B with 
a deleted nuclear targeting signal still becomes nuclear 
if its domain for interaction with histone H2A is intact, 
indicating that a protein can carry a different protein, 
as a heterodimer, into the nucleus [99]. Localization of 
the adenovirus 140 kDa DNA polymerase is facilitated 
by interaction with another viral protein, pTP, again 
suggesting that a protein can use another protein to 
'piggyback* into the nucleus [161]. The CDC2 protein 
kinase and the CDC13 cyclin form a complex and are 
co-localized to the nucleus. In the absence of CDC13, 
CDC2 is not active as a kinase and is no longer nuclear, 
suggesting that the cyclin regulates both the enzymatic 
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rtivity and the cellular location of the kinase [12]. 
ihally, in romc instances, it is RNA with which pro- 
ins need to associate before gaining entry to the 
lcleus. Xenopus U2 snRNA and its binding proteins 
e excluded from the nucleus if present separately, but 
come nuclear after assembly into ribonucleoprotein 
irticles [39,97]. 

I- Conclusion/projections 

Which areas of nuclear protein import could, or 
ould, receive the most attention by investigators in 
i upcoming years? Clearly, the structure, function and 
sembly of the transporter as well as the entire NPC is 
central importance. Given the importance and com- 
;xity of the NPC, our understanding of this structure 
extremely rudimentary. The mechanism by which 
clear protein localization is regulated and possibly 
ated to signal transduction is also intriguing. There 
; hints in the literature [24,79,132] that nuclear import 
'olves more than one pathway; direct evidence for a 
:ond pathway could emerge. The issue of subnuclear 
alization is also in its infancy. In the near future, we 
mid see which, if not all, of the many NLS-binding 
>teins identified so far are actual receptors. This 
>uld, in turn, allow more sophisticated experiments 
iigned to probe the function of the import machinery. 
j should also see greater application of genetics to the 
>blem of nuclear import. 
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Abstract. Homologues of barley Mlo encode the 
only family of seven-transmembrane (TM) proteins 
m plants. Their topology, subcellular localization, 
and sequence diversification are reminiscent of those 
of G-protein coupled receptors (GPCRs) f rom ani- 
mals and fungi. We present a computational analysis 
of MLO family members based on 31 full-size and 3 
partial sequences, which originate from several 
monocot species, the dicot Arabidopsis thaliana, and 
the moss Ceratodon purpureus. This enabled us to 
date the origin of the Mlo gene family back at least to 
the early stages of land plant evolution. The genomic 
organization of the corresponding genes supports a 

- monophyletic origin of the Mlo gene family. Phylo- 
genetic analysis revealed five clades, of which three 
contain both monocot and dicot members, while two 
indicate class-specific diversification. Analysis of the 

ratio of uousynonymous-to-synonymous changes in 
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coding sequences provided evidence for functional 
constraint on the evolution of the DNA sequences 
and purifying selection/which appears to be reduced 
m the first extracellular loop of 12 closely related 
orthologues. The 31 full-size sequences were exam- 
ined for potential domain-specific intramolecular 
coevolution. This revealed evidence for concerted 
evolution of all three cytoplasmic domains with each 
.other and the C-terminal cytoplasmic tail, suggesting 
. interplay of all intracellular domains for MLO 
function. 

Key words: - Seven-transmembrane protein — Co- 
evolution — Gene family — Exon/exon junctions 
Mlo — G-protein coupled receptor 



Introduction 

In barley, presence of the wild-type Mlo gene mod- 
ulates defense responses to the powdery mildew 
fungus, Blumeria graminis f. sp. hordei (Biischges et aL 
1997). Homozygous mlo mutant plants exhibit full 
resistance to the fungal pathogen, whereas Mlo 
overexpression results in supersusceptibility (Wolter 
et al 1993; Kim et al. 2002b). MLO is likely to have a 
role in additional biological processes since axenically 
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grown mlo mutant plants show accelerated leaf se- 
nescence symptoms and a spontaneous cell death 
phenotype (Wolter et al. 1993; Peter/hansel et al. 1997; 
Piffanelli et al. 2002). This suggests a function for 
MLO in cell death protection upon biotic stress and 
leaf senescence. Two genes, Rorl and Ror2, have been 
described that are required for full m/o-mediated re- 
sistance. Mutations in either of these genes confer 
partial susceptibility in an mlo mutant background 
and also compromise the spontaneous cell death 
phenotype (Freialdenhoven et al. 1996; Peter/hansel et 
al. 1997). 

To date, MLO is the only plant polytopic membrane 
protein experimentally shown to consist of seven 
membrane-spanning domains (Devoto et al. 1999). 
However, a further protein, the putative GPCR GCR1, 
is predicted also to contain seven transmembrane (TM) 
helices (Josefsson and Rask 1 997; Plakid'ou-Dymock et 
al. 1 99 8). The barley MLO protein resides in the plasma 
membrane, with the N terminus positioned extracel- 
lularly and the C terminus intracellular^ (Devoto et al. 
1999). Database searches have revealed that MLO 
belongs to a gene family .that is restricted to the plant 
kingdom. Inspection of the near-full-length Arahid- 
opsis genome has shown that Mlo-]ike genes represent 
the only sequence-diversified family encoding seven- 
TM (7TM) proteins in plants, while GCR1 is a single- 
copy gene (Devoto et al. 1999; The Axabidopsis Ge- 
nome Initiative 2000). To date, all known animal and 
fungal (including yeast) sequence-diversified protein 
families with a 7TM topology function as G-protein 
coupled receptors (GPCRs), which relay extracellular 
signals into an intracellular response by activating a 
heterotrimeric G-protein (Bockaert and Pin 1999). 
Recent data, however, mdicate^at^ 
jgfensejmEBie^ 

of -he terotrimeric G -proteins and that calmodulin in- 
teract^wilh-igro to 

th^pow^gurfdew-fetigus (Kim et al. 2002b). 

Here we present a thorough computational analysis 
of the MLO protein family based on a comprehensive 
set of sequences derived from Arabidopsis and maize to 
trace back the phylogenetic history of these plant- 
specific proteins. We have investigated the data set for 
the presence of domain-specific adaptive molecular 
evolution. A recently developed algorithm that allows 
the identification of protein-protein interaction pairs 
identified candidate domains that have evolved in a 
concerted manner. Our findings are consistent with a 
presumptive receptor function of MLO proteins. 

Materials and Methods 

Mlo DNA Sequences 

Mlo cDNA sequences from Arabidopsis were obtained by reverse 
transcriptase polymerase chain reaction (PCR) using oligonucleo- 



tides that were derived from the publicly available genomic se- 
quences. Similarly, cDNAs of TaMloJ, TaMlo2, and OsMlo2 and 
genomic sequences of Mlo2 and OsMlol were obtained using 
standard procedures (Elliott et al. 2002, further details about the 
isolation of these clones will be published elsewhere). Sequence 
information about Zea mays Mlo cDNAs (ZmMlol to -P) were 
derived from corresponding -expressed sequence tag (EST) clones 
from the combined DuPont/Pioneer EST collection. Nucleotide 
sequences of all cDNAs were determined by applying standard 
techniques on ABI373/377 automated sequencers. 

Phylogenetic Analyses 

Protein sequences were aligned using PileUp (Wisconsin Package 
Version 10.0; Genetics Computer Group, Madison, WI) and op- 
timized by hand. Phylogenetic analyses were performed using the 
maximum parsimony search optimality criterion of PAUP* v.4.0b8 
(Swofford 1998). Maximum parsimony analysis of protein se- 
quences was performed for (i) full-length sequences excluding N 
and C tennini, (ii) all transmembrane regions only, (iii) all extra- 
cellular and intracellular regions, (iv) all extracellular regions, and 
(v) all intracellular regions. An additional analysis was performed 
for a partial sequence alignment including an MLO homologue of a 
moss, Ceratodon purpureus. Searches were performed using the 
heuristic search option and all trees were rooted using the midpoint 
rooting option. Support for the branching arrangements was 
evaluated by bootstrap analyses using 1000 replicates. 

Calculating & N I& S Ratios 

To calculate the ratio of nonsynonymous-to-synonymous substi- 
tutions (d^/d s ) we used the ynOO program of PAML (Yang 1997) 
implementing the method of Yang and Nielsen (2000). For these 
analyses we used an alignment of 1 wheat (TaMlo2) sequence and 
1 1 sequences derived from nine species of the genus Hordeum. The 
Hordeum sequences correspond to arnino acid residues 69-145 of 
barley MLO, covering the first extracellular loop and some 
neighboring residues, and were obtained by standard PCR ampli- 
fication using genomic DNA as template and oligonucleotides 
Mlo4 5'-AAGGCGGAGCTCATGCTGGTGGGC-3' and Mlo 5 
•5'-ACGGCTTAGAGCTATGGTGATGAC-3' as primers. Am- 
plification products (-350-400 bp, including one intron) were 
purified on agarose gels, subcloned in pGEM-Teasy (Promega), 
and subjected to sequence analysis. We dissected the resulting nu- 
cleotide sequences (excluding primer and intron sequences) into 
three parts that were investigated separately: (i) the whole stretch, 
corresponding to amino acids 69-145 of barley MLO, (ii) extra- 
cellular loop 1 excluding the region between conserved cysteine 
residues 86 and 114, and (iii) the region between conserved cysteine 
residues 86 and 114. The ynOO program calculates d M /d s ratios for 
each pairwise comparison. We then summarized these as an aver- 
age dx/d s ratio for each region (excluding ratios that had a zero 
value for either d^ or d s ) to compare differences in the rate of 
amino acid substitution among the three regions. 

Coevolution Analysis 

The correlation analysis was done on every possible domain-do- 
main pair using methods described previously (Goh et al. 2000). 
Distance matrices were generated from the multiple alignments 
using ClustalW (Thompson et al. 1994). We employed a linear 
regression analysis measuring the correlation between pairwise 
evolutionary distances among all peptides in a multiple sequence 
alignment. These were correlated with the evolutionary distances 
among the corresponding binding partners using the linear corre- 
lation coefficient r (Pearson's correlation coefficient (Press et al. 
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Table 1. Compilation of Mlo homologues 



Gene 



AtMlol 

AlMlo2 

At Mlo 3 

AtMlo4 

AiMloS 

AtMlo6 

AtMlo7 

AtMloB 

AtMlo9 

AtMlolO 

AtMloll 

AtMlol 2* 

AtMlol 3 h 

AtMlol 4 

AtMlol 5 

CpMlo 

Mlo 

Mlo2 

OsMlol 

OsMlo2 

OsMlo3 

OsM2o4 

TaMloJ 

TaMlo2 

TaMlo3 

ZmMlol 

ZmMlo2 

ZmMlo3 

ZmMlo4 

ZmMloS 

ZmMlo6 

ZmMlol 

ZmMlo8 

ZmMlo9 



GenBank accession No. 



Organism 



A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
A. thaliana 
C. purpureas 
H. vulgare 
H. vulgare 
O. saiiva 
O. sativa 
O. sativa 
O. sativa 
T. aestivum 
T. aestivum 
T. aestivum 

Z. mays 

Z. mays 

Z. mays 

Z. mays 

Z. mays 

Z, mays 

Z. mays 

Z. mays 

Z. mays 



cDNA 


Gen omic 


Z95352 


At4g02600 


AF369563 


Atlgll310 


AF369564 


At3g45290 


AF369565 


AtlgllOOO 


AF369566 


- AQg33670 


AF369567 


. Atlg61560 


AF369568 


At2gl7430 


AF369569 


At2gL7480 


AF369570 


Atlg42560 


AF369571 


At5g65970 


AF369572 


At5g53760 


AF369573 


At2g39200 


AF369574 


At4g24250 


AF369575 


Atlg26700 


AF369576 


At2g44110 


AW087034 




Z83834 


Y14573 




295496 




Z95353 


AF384030 


AP000615 


AF388195 






AC0731 66 



AX063298 
AX063294 
AX063296 
AY029312 
AY029313 
AY029314 
AY029315 
AY029316 
AY029317 
AY029318 
AY029319 
AY029320 



Genome position 



Chr. TV, 15 cM 
Chr. I, 10 cM 
Chr. m, 61 cM 
Chr. I, 10 cM 
Chr. II, 76 cM 
Chr. I, 84 cM 
Chr. n, 32 cM 
Chr. II, 32 cM 
Chr. I, 62 cM 
Chr. V, 128 cM 
Chr. V, 100 cM 
Chr. H, 72 cM 
Chr. IV, 83 cM 
Chr. I, 38 cM 
Chr. II, 78 cM 
n.d. 

Chr. IV 
Chr. IV 
Chr. VI 
. Chr. IH 
n.d 

Chr. X 
n.d. 
n.d 
n.d. 

Chr. I, bin 1 
Chr. I, bin 4 
Chr. n, bin 4 
Chr. HI, bin 5 
Chr. HI, bin 6 
Chr. V, bin 4/5 
Chr. DC, bin 4 
Chr. VI, bin 5-7 
n.d. , 



Introns 



11 

13 

14 

14 

14 

13 

13 

14 

14' 

14 

14 

14 

13 

14 

13 

11 
11 
12 

, 12 

14 



Amino acids 



Note: p.s., partial sequence; — genomic or cDNA sequence not available; 
. Formerly designated AtMlol 8 (Devoto et al. 1999). 
Formerly designated AtMlo20 (Devoto et al. 1999). 



526 

573 

508 

570 

500 

583 

542 

593 

460 

569 

565 

576 

478 

550 

496 

p.s. 

533 

544 

540 

555. 

554 

580 

534 

534 

534 

563 ' 

565 

496 

509 

p.s. 

515 

499 

492 

p.s. 



n.d., not determined. 



1998) between the distance matrices of all possible interacting do- 
mains where -lSrS+1. Positive values of r indicate a positive 
correlation, and r values around zero indicate no correlation. Ad- 
ditionally, negative values of r indicate anticorrelation. 



Results and Discussion 

Phylogenetic Analysis ofM\o-like Genes Suggests an 
Origin in the Early Stages of Land Plant Evolution 

Previously, we described the existence of Mfc-like 
sequences in different monocot and dicot species 
(Devoto et al. 1999). In the meantime, further geno- 
mic sequences and ESTs sequence-related to barley 
Mlo were released. By searching the public databases 
using the BLAST or PSI-BLAST algorithms (http:// 
www.ncbi.nlm.nih.gov/BLAST/), Mfo-like genes 
were identified in an even broader range of mono- 
cotyledonous (Hordeum vulgare, Oryza sativa, Secale 



cereale, Triticum aestivum, Zea mays) as well as di- 
cotyledonous plant species {Arabidopsis thaliana, 
Brassica rapa, Citrullus lanatus, Glycine max, Gossy- 
pium hirsutum, Linum usatissimum, Lotus japonicus, 
Lycopersicon esculentum, Medicago truncatula, Sola- 
num tuberosum, Sorghum bicolor). Multiple distinct 
genes were found in most of these species, indicating 
their organization into multigene families. 

Recently, the nearly full genomic sequence of 
Arabidopsis thaliana was released (The Arabidopsis 
Genome Initiative 2000), covering more than 90% of 
the 125-Mb genome of the weed. Based on this data, 
we identified 15 distinct members for which full- 
length genomic sequences are known (Table 1). The 
remaining 10 Mb of the Arabidopsis genomic se- 
quence is supposed to cover mainly rDNA repeat 
units and centromeric and telomeric regions as well as 
other regions of complex sequence structure that are 
unlikely to harbor many coding sequences. Thus, we 
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conclude that the 15 Arabidopsis Mlo homologues 
identified to date are likely to represent the actual 
number. A former estimate of 25-35 homologues 
(Devoto et al. 1999) is apparently due to an over- 
representation of Mlo homologues in early released 
sequences of the Arabidopsis genome. The designa- 
tion of the 15 genes is given in Table 1 (see also http:// 
www.ambidopsis.org/info/genefamQy/mlo.html). Only 
eight of these are currently represented by corre- 
sponding ESTs hi GenBank, indicating their gener- 
ally low expression levels. However, we were able to 
isolate matching cDNAs for all members by reverse 
transcriptase PCR. Subsequent DNA sequencing 
confirmed the identity of the clones, demonstrating 
that all 15 members are expressed, albeit at low levels 
(Table 1 and data not shown). 

To identify Mlo family members in the monocot- 
yledonous plant Zea mays, we searched the Pioneer/ 
DuPont maize EST database, which to date com- 
prises 400,000 ESTs. Nucleotide sequences of nine 
distinct Mlo genes were identified in this database 
(seven of which are full-length), indicative of a similar 
total number of Mlo genes in maize and Arabidopsis. 
Like Arabidopsis, most of the maize genes are ex- 
pressed either at a low level or preferential in par- 
ticular tissues (data not shown). 

Except for barley Mlo, no biological function has 
been assigned to any other Mlo-hke gene to date. We 
have isolated cDNAs from wheat and rice that are 
exceptionally similar to barley Mlo. Due to their 
syntenic genomic locations relative to the barley gene 
on chromosome 4H, these members are likely to be 
orthologues (Elliott et al. 2002). In single-cell trans- 
fection experiments of barley mlo mutants (Shirasu et 
al. 1999), OsMlol and TaMlo2 showed either full 
(TaMloI) or partial (OsMlo2) complementation, in- 
dicating that during evolution the function of these 
orthologues was preserved (Elliott et al. 2002). A 
comprehensive list of all 34 members analyzed here is 
shown in Table 1. 

Phylogenetic analysis performed on 31 MLO full- 
length protein sequences identifies six subfamilies 
comprised of five clades (I-V), with strong bootstrap 
support for the monophyly of each clade, and a single 
divergent lineage (AtML03; Fig. 1). There is also 
strong bootstrap support for a sister group relation- 
ship between subfamily I and subfamily II, while re- 
lationships among the remaining subfamilies are 
unresolved. With a few exceptions, phylogenetic 
analyses of specific regions of the Mlo genes also re- 
cover these six subfamilies with moderate to high 
bootstrap support (Table 2). On average, subfamily 
members exhibit 45% identity and 70% similarity at 
the amino acid level. Interestingly, subfamily TV 
comprises only monocot homologues, including the 
presumptive orthologues from barley, wheat, and 
rice. Similarly, three Arabidopsis members (AtML02, 



AtML06, and AtML012) cluster together and define 
subfamily V, which appears to be restricted to dicots 
(or, alternatively, to Arabidopsis) given the fact that 
the analysis of 400,000 maize ESTs failed to reveal 
members of this gene cluster. The results of the 
phylogenetic analysis support an early evolutionary 
diversification of the MLO subgroups, well before the 
origin of monocots and dicots. MLO homologues of 
Arabidopsis and Zea mays are highly divergent, with 
representatives in clades I, II, III, and V and clades I, 
II, III, and IV, respectively.* Maintenance of these 
subfamilies (clades) may indicate preservation of an 
early functional diversification. Whether monocot- 
and dicot-specific clades IV and V emerged after the 
separation of these two classes or whether mem- 
bers of these clades were lost subsequently remains 
elusive. . 

Since monocots are believed to have diverged from 
dicots approximately 100-270 million years ago 
(Wolfe et al. 1989; Schneider-Poetsch et al. 1998), 
Mfo-like genes must have already existed in their 
common progenitor. In fact, it would appear that this 
gene family is much older than the monocots and di- 
cots. The monocot and dicot MLO sequences At- 
ML04/ZmML04 and AtMLO 1 /ZmML08 group 
together as sister homologues with bootstrap values of 
100 and 70, respectively (nodes A and B in Fig. 1). 
Unless these relationships are the result of horizontal 
gene transfer, the ages of these two nodes can be no 
younger than the 100 million- to 270 million-year di- 
vergence time between monocots and dicots. Several 
ESTs have been identified for the gymnosperm Pinus 
taeda demonstrating the presence of Mlo homologues 
in both subphyla of the spermatophyta (seed plants), 
angiosperms and gymno sperms, which are believed to 
have diverged from a common ancestor about 340— 
360 million years ago (Wolfe et al. 1989; Troitsky et al. 
1991). Moreover, several ESTs (-20 of —65,000) with 
a high sequence similarity to Mlo originate from the 
bryophyte Physcomitrella patens, and one (of —1,700 
ESTs) from the moss Ceratodon purpureus. A maxi- 
mum parsimony analysis of an alignment based on the 
regions- corresponding to the partial C. purpureus se- 
quence (68 amino acids of the C terminus; Fig. 1) 
shows this sequence to fall within the diversity of 
monocots and dicots, with moderate bootstrap sup- 
port for its placement within subfamily I. Bryophytes 
and tracheophytes (vascular plants) are believed to 
have diverged early in the evolution of green land 
plants, between the mid-Ordovician and the early Si- 
lurian period, approximately 400—450 million years 
ago (Wolfe et al. 1989; Kenrick et al. 1997). Thus, 
unless this is the result of horizontal gene transfer, a 
common ancestor of both must already have pos- 
sessed an Mlo homologue and the node uniting 
CpML01/AtML04/ZmML04 (node C in Fig. 3) can 
be no younger than the 400 million- to 450 million- 
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Fig. 1. Maximum, parsimony phylogenetic 
analysis of amino acid sequence data for monocot 
and dicot MLO family members. Maximum par- 
simony tree constructed from full-length amino 
acid sequence data for MLO genes, excluding N 
and C termini. Branch lengths are proportional to 
the amount of amino acid changes. Numbers at 
the nodes indicate bootstrap support values (1000 
replicates) above 60. Roman numerals denote 
major clades (subfamilies) referred to in the text. 
Nodes A and B indicate monocot and dicot sister 
lineages (dashed lines) referred to in the text. Inset • 
The phylogenetic position (node Q of the bryo- 
phyte MLO sequence (CpMLOl) from a maxi- 
mum parsimony analysis of an alignment of 
partial sequences corresponding to the 68 amino 
acids of CpMLOl. The analysis included all MLO 
sequences in the partial alignment, but for clarity 
only clade I containing CpMLOl is shown. 



Table 2. Bootstrap support values (1 000 replicates) for monophyly of clades I-V from maximum parsimony analyses of specific regions of 
MLO protein sequences 



MLO region analyzed 



Bootstrap support value 



Clade I 



Clade II 



Full protein excluding N and C termini 
Intra- and extracellular regions 
Transmembrane regions 
Intracellular regions 
Extracellular regions 



Clade JH 



Clade IV 



Clade V 



100 
100 
87 
99 
95 



99 
100 
73 
90 
55 



92 
78 
59 
66 
<50 



100 
100 
99 
95 
99 



100 
100 
100 
100 
100 



year divergence time between bryophytes and tra- 
cheophytes. 

We conclude from this observation that the pres- 
ence of Mlo genes can be traced back at least to the 
early evolutionary stages of land plant development. 
■ This implies an ancient and vital function for the 
MLO family in plants. EST database searches (h±tp:// 
www.kazusa.or.jp/en/plant) of the unicellular green 
alga Chlamydomonas reinhardtii (37,990 ESTs) and 
the marine red alga Porphyra yezoensis (10,154 ESTs) 
detected no Mlo-like sequences in these two species. 
This could be the first evidence that Mid emerged 



concurrently with the conquest of terrestrial habitats, 
although we cannot rule out the possibility that the 
number of currently available algal ESTs is too low 
to identify Mfo-like sequences. 

Closely related members belonging to the same 
subfamily but originating from different species may 
be identified as orthologues with similar functions, as 
demonstrated experimentally for MLO, TaML02, 
and OsML02 (see above; Elliott et al. 2002). Whe- 
ther the observed clustering correlates generally with 
a common function of the members is currently under 
investigation. 
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A Common Scaffold Topology Accommodates Two 
Hypervariable Domains 

A hallmark of all MLO family members is the pres- 
ence of seven TM domains. The predictions obtained 
for each of the full-size family members from Table 1 
using the TMHMM algorithm (Sonnhammer et al 
1998) exactly matched the 7TM topology determined 
experimentally for the barley MLO protein (Devoto 
et al/ L999). Similarly, the predicted distribution of 
the amino acid residues with respect to the membrane 
is comparable to that for the barley protein: generally 
50-60% of the protein is predicted to be cytoplasmic, 
20-30% to be embedded in the membrane, and the 
rest is thought to be extracellular/lumenal. These 
observations indicate a shared scaffold topology for 
all MLO protein family members, consisting of seven 
TM helices, an N-terminal extracellular or lumenal 
end, three cytoplasmic and three extracellular/lu- 
menal loops, and a cytoplasmic C-terminal tail (Fig. 
2). Although a rice MLO homologue has also been 
shown to reside within the plasma membrane (Kim et 
al. 2002a), the scaffold topology does not provide 
conclusive evidence for a common subcellular local- 
ization. For simplicity, we refer in the following to 
"extracellular" rather than to "extracellular/lumen- 
al" domains. 

Another characteristic is the presence of four 
strictly conserved cysteine residues in extracellular 
loops 1 and 3 (Fig. 2). If these cysteine residues form 
(a) disulfide bridge(s) either with each other or with 
the two other invariant extracellular cysteines, this 
domain could subsequently form an exposed loop/ 
ligand binding site. This is frequently found in 
mammalian 7TM receptors to stabilize the relative 
arrangement of the TM helices to each other (Probst 



Fig. 2. Scheme of the MLOs proteins. 
Gray boxes designate the seven TM helices. 
Arrowheads indicate the position of splice 
junctions (exon/exon junctions at the pro- 
tein level), with the corresponding introns 
numbered with Roman numerals. C, M, 
and W denote conserved cysteine, methio- 
nine, and tryptophan residues, respectively. 



et al. 1992; Strader et al. 1994). Extraordinary length 
variability occurs between cysteine residues 99 and 
115 in extracellular loop 1, contributing to an ex- 
ceptional sequence variation in this region among 
family members (Fig. 3A). The C terminus defines the 
second domain, which is highly variable in both se- 
quence and length (ranging from 55 to 253 amino 
acid residues; Fig. 3B). However, the first -25 resi- 
dues proximal to TM VII are rather conserved, har- 
boring the recently discovered calmodulin binding 
site present in MLO proteins (Fig. 2) (Kim et al. 
2002a, b). A hallmark of this binding site is a strictly 
conserved tryptophan residue that has been demon- 
' strated to be essential for the interaction with cal- 
modulin (Figs. 2 and 3B) (Kim et al. 2002a, b). 

Sequence Diversity in Extracellular Loop 1 
and Reduced Functional Constraint 

The .comparatively high level of sequence variability 
observed in extracellular loop 1 can be interpreted in 
two ways: either this region determines specificity of 
individual MLO members by creating unique binding 
■ sites for putative ligands or this region has no isoform- 
specific function but serves as a structural component 
of the 7TM family. In. the latter case, the observed 
sequence variability would be the result of evolution by 
random drift, while in the former, it would reflect se- 
lection toward isoform specificity. To distinguish be- 
tween these alternatives, the ratio d^/d s of 
nonsynonymous (amino acid-changing; ^i^) -to -syn- 
onymous (silent; d s ) substitutions per nonsynonymous 
and synonymous sites is a suitable indicator. Pseud- 
ogenes without any evolutionary selective pressure will 
accumulate neutral and amino acid-changing substi- 
tutions in their DNA sequence at the same frequency, 



83 



aca — Essij&Ssii- 



S^fe^j^L I fe^THI>£^-- THE* 



S&SKL I LS£3?TVI*REISF$s£^t - 



r — s^vas — -r 




^ l-—: — ^^v^^^um^m^m 



5& 

SSSfcVGS5K»a£LRGKfe*«I& BS2" 



—-SPY Q5r-~™»™ 



ea-cs^fgg^EH 

^SS^^%Tu^?DH^23>- 

~ — — -SS^O-^?^S$!SES-*— ■ — 

- 



~GQa&EE^-r " 



— -H^ES^BB^-^^fc-OVy^ — : ~ — — ^.^.^^ 



-3?«3SS^SS-~it^Ott 



-E£SsKfc2S£ £~& r £Ef*» 



1 ■ : SEG?^vA!>- — £fI!KSnsjifmZ£3^ 

$ 5KTX^aan , OTJ0B!DS^5SS^ — — 1 — ~- : $gFB$2ssk 




Fig. 3. Multiple sequence alignment of MLO proteins. A Align- 
ment of amino acid sequences corresponding to. the first extracel- 
lular loop. B Alignment of amino acid sequences corresponding to 
TM VII and the C-terminal tail. Sequences were aligned using 
PileUp (WLnsconsin Package Version 10.0; Genetics Computer 
Group, Madison, WT); spaces were manually introduced to in- 
crease the similarity in the alignment. Shading indicates the degree 
of conservation between amino acids. Identical amino acid residues 



resulting in a djsr/d s ratio of ~1. In contrast, in the 
majority of genes most of the occurring nonsynony- 
mous changes are probably deleterious, resulting in 
purifying counter-selection. In these cases, synony- 
mous substitutions take place more often than non- 
synonymous ones, resulting in a dN/dS ratio of < 1 . As 
a third possibility, certain coding regions are selected 
for extraordinary high rates of nonsynonymous sub- 
stitutions (resulting in a d^/d s ratio of > 1). This be- 
havior is true for fast-evolving genes that underlie 
adaptive molecular evolution as, for example, several 
surface antigens of pathogens and the matching de- 
fense systems in the corresponding hosts (Yang and 
Bielawski 2000). Since this method provides reliable 
results only if the sequences investigated are neither 
too similar nor too different (Yang and Bielawski 
2000), we first had to select suitable sequences. Known 
full-size MLO sequences were unsuitable because they 
are highly divergent in extracellular loop 1 (Fig. 3A). 
We PCR- amplified a fragment of the Mlo genomic 
sequence (corresponding to extracellular loop 1 and 
some flanking amino acid residues) from eight species 
of the genus Hordeum (Materials and Methods; Table 
3 and Fig. 4). In two cases, we obtained two distinct 
sequences each, likely reflecting the polyploid nature of 
these species. The resulting predicted amino acid se- 
quences are only moderately divergent in extracellular 
loop 1 and thus ideally suited for d^/d s analyses 
(compare Figs. 3A and 4). 



(100% conserved) are shaded in black; 80% or greater conserved, 
60% .or greater conserved, and less than 60% conserved residues are 
shaded in dark gray, light gray, and white, respectively. Numbers 
indicate amino acid positions within the protein; asterisks indicate 
conserved cysteine residues; black and gray triangles indicate the 
methionine corresponding to the start of the last exon and the 
conserved tryptophan residue of the calmodulin binding domain, 
respectively. 

We calculated d^/d s ratios for each possible pair 
of 10 amplified and 2 previously known sequences of 

(i) the region corresponding to amino acid residues 
69-145 of barley MLO, (ii) extracellular loop 1 ex- 
cluding the region between conserved cysteines 86 
and 114, and (iii) the region between conserved cy- 
steines 86 and 114. The average d^/d s ratio values for 
each of these are 0.138, a = 0.048 (i), 0.154, cr= 0.054 

(ii) , and 0.275, a = 0.170 (iii). All of these values axe 
well below 1, indicating functional constraint on the 
evolution of the DNA sequences ■ and purifying 
selection. However, it appears that functional con- 
straint is less for the region between conserved cys- 
teine residues 86 and 114 in extracellular loop 1, as 
the average d^/d s ratio for this section is almost two 
times higher than that of its 5' and 3' flanking se- 
quences (0.275 versus 0.154, respectively), although 
this difference is not statistically significant. This re- 
sult can be interpreted in two ways. It may indicate 
that relaxed constraint in DNA evolution causes, 
over long periods of time, the sequence variation 
found among compiled MLO family members. 
Alternatively, in this particular case, the d-^/d s ratio 
might not be a reliable indicator for the molecular 
mechanism leading to the observed variability. 
It will be interesting to find out whether differences 
in this region correspond to the ability to bind 
diverse interacting partners in the extracellular 
space. 
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Fig. 3. (Continued) 



Structural Organization of Mlo Genomic Sequences 
Provides Further Evidence for a Monophyletic Origin 
of the Gene Family 

A comparison of the gene structure among available 
full genomic sequences of family members revealed 11 
to 14 introns per Mlo gene (Table 1). Most of the 



introns are 80 to 90 nucleotides in size, with no se- 
quence conservation even among phylogenetically 
closely related members. It is noticeable that in all but 
one case the exon/exon junctions map exactly at the 
identical position at the corresponding protein level, 
supporting a monophyletic origin for the gene family 
(Fig. 2). The only exception is represented by intron 
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Table 3. Sequences of Hordeum species used for the d^/d s analysis 
Species Ploidy* 



H. vulgare 

H. vulgare f. agriocrithon 
H. vulgare ssp. spontaneum 
H. brevisbulatum 
H. bulboswn 
H. chilense 
TJ. jubatum 

H. murinum ssp. murinum 
H. murinum ssp. leporinum . 

a According to von Bothmer et ah (1995). 



Diploid 
Diploid 
Diploid 

Diploid, tetraploid, and hexaploid 

Diploid and tetraploid 

Diploid 

Tetraploid 

Tetraploid 

Tetraploid and hexaploid 



GenBank accession No.(s.) 

Z83834 

AY090646 

AY090647 

AY09063S, AY090639 

AY090641, AY090642 

AY090643 

AY090640 

AY090645 

AY090644 



8. 
8* 



. . sgriocri-txron 

Tssfcfrieoaji a&scxwzi {TsKZjQZi 70 

Fig. 4. Amino acid sequence alignment of MLO sequences used 
for the ^ N /^s analysis. Amino acid sequences corresponding to 
extracellular loop 1 and flanking regions from 1 1 presumptive or- 
thologues of nine species of the genus Hordeum and a wheat se- 
quence were aligned using ClustalW. Identical amino acid residues 

V, which is located at a slightly different position in 
AtMlol, -13, and -15. Intron VI is absent in AtMlo2 
and AtMlo6, while intron XI is missing in AtMIol, - 
13, and -75. These observations are in full agreement 
with the phylogenetic analysis (see above and Fig. 1), 
suggesting that highly similar members within Ara- 
bidopsis did not arise by convergence from different 
progenitor sequences but diverged from a single 
common ancestor gene. The C-terminal tails are al- 
ways encoded by a single exon, invariably starting 
with a consensus translational initiation sequence 
including the start codon ATG (Figs. 2 and 3B). 
Whether this reflects an ancient gene shuffling event 
remains speculative. 

The splice junctions in the gene family map mainly 
to the boundaries between the encoded loop and 
transmembrane regions (Fig. 2). Eight of the 14 exon/ 
exon junctions are located proximal or distal to the 
transmembrane helical termini. Only one TM helix 
(VT) is interrupted by a splice junction. The remaining 
junctions are located within extracellular loop 1, intra- 
and extracellular loop 2, and TM helix VI. No exon- 
exon junction was observed in the amino- and the 
carboxyl-terminal ends of the family members proxi- 
mal to the first TM helix or distal to TM VII. The fact 
that individual TM helices are encoded by single exons 
is common to other polytopic membrane proteins 
(Argos and Rao 1985; Miao and Verma 1993). This is 




(100% conserved) are shaded in black; 80% or greater conserved, 
60% or greater conserved, and less than 60% conserved residues are 
shaded in dark gray, light gray, and white, respectively. The two 
asterisks indicate conserved cysteines that are at positions 86jand 
(mlin barley MLO. ^ 

thought to reflect their role as an evolutionary unit that 
is subject to severe selection constraint to maintain the 
structurally stable, multihelical TM core. Such a unit 
may serve as a module to create variability in the 
number of TM helices of polytopic membrane proteins 
(e.g., by exon shuffling). 



AtMlo Distribution in the Arabidopsis Genome 

It has been demonstrated recently that most of the 
genome of Arabidopsis thaliana is internally duplicat- 
ed, indicating Arabidopsis as a potential ancient tetra- 
ploid species (Blanc et al 2000; The Arabidopsis 
Genome Initiative 2000; Vision et al. 2000). Addi- 
tionally, Vision et al. (2000) provided evidence that the 
current state of the Arabidopsis genome may result 
from at least four large-scale duplication events that 
took place 100 to 200 minion years ago. These'duph- 
cation processes must have also involved chromosome 
fusions resulting in extended genomic regions in which 
the number, order, and orientation of duplicated genes 
are preserved. After duplication, affected regions were 
subject to extensive subchromosomal rearrangements, 
such as inversions, translocations, and loss or trans- 
position of single genes or groups of genes. 

We investigated the distribution of AtMlo genes in 
extended duplicated genomic regions to identify pu- 
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genome duplications. Relative positions of the 15 AtMlo genes are shown. (Adapted from Blanc et al. 2000). 



tative functionally redundant copies of AtMlo genes. 
For this analysis we used the template map of Ara- 
bidopsis genomic duplications described by Blanc et al. 
(2000) because the start and end of the copied regions 
are exactly designated by particular BAC clones. We 
found that Mlo genes are located on all five chromo- 
somes without any obvious clustering. With two ex- 
ceptions (AtMlo9 and -75), all AtMlo genes are located 
within regions that are supposed to have undergone a 
previous large-scale duplication event (Fig. 5). Unex- 
pectedly, AtMlo genes were always found as a single 
copy in the duplicated areas, except AtMlo2 and At- 
MIo6, for which the number, order, and orientation of 
flanking genes are rather conserved. Although it is 
known that less than half of the genes (37^47%, de- 
pending on the significance criteria) in the duplicated 
areas are conserved in their corresponding copy region 
(Bancroft 2001), AtMlo genes behave differently be- 
cause of only a single recognizable duplication. Whe- 
ther this indicates constraints in copy numbers or 
exceptionally high micro-translocation/deletion 
events cannot be resolved. Taken together, this ap- 
proach identifies only AtML02 and AtML06 as the 
result of a large-scale duplication event. It should be 
interesting to find out whether these two genes are 
functionally redundant or whether the few sequence 
differences lead to functional diversification. 

Coevolution Among Domains of MLO Proteins 

Recently, Goh et al. (2000) have developed an algo- 
rithm that allows the identification of protein— protein 



interaction pairs and can be adapted to the assess- 
ment of intramolecular coevolution of peptide do- 
mains within a single protein family. The method is 
based on the assumption that if there are two do- 
mains within a single protein that have to act coop- 
eratively for proper function, evolutionary changes of 
the amino acid sequence within one of the domains 
will either result in counter-selection or in compen- 
sating changes in the amino acid sequence of the 
other domain. In terms of evolution, these two do- 
mains will evolve in a coordinated manner, resulting 
in a linked phylogenetic relationship. If there is no 
cooperation between the two domains, they are be- 
lieved to evolve independently resulting in an un- 
linked phylogenetic relationship. The algorithm has 
been used by Pazos and Valencia (2001) to test the 
impact of the method by analyzing potential intra- 
molecular interactions of structural domains in bi- 
. partite proteins and by investigating known protein- 
protein interaction pairs. The authors conclude from 
their results that the procedure is capable of detecting 
true interactions in > 66% of the cases if a correlation 
> 0.8 is detected. 

We dissected 31 full-length sequences of MLO 
proteins into their single peptide domains. This pro- 
cedure resulted in 15 sets of peptide sequences, rep- 
resenting the N and C termini, the seven TM regions, 
the three cytoplasmic, and the three extracellular 
loops. We paired each set of peptide sequences with 
each other and calculated correlation coefficients for 
all 105 possible pairings (Fig. 6 and Materials and 
Methods). The observed correlation coefficients were 
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Fig. 6. Interdomain correlation analysis 
of MLO proteins. Correlation coefficients 
of all 105 interdomain pairings of the 15 
sets of peptide domains from 31 MLO 
proteins were plotted against the relative 
ranking (ranging from 1 to 105) of the re- 
spective pair. Mean values and 1.96x stan- 
dard deviations are indicated by the bold 
horizontal line and dotted horizontal lines, 
respectively. 



in the range of 0.15 to 0.85, .with an average of 0.51 
and a standard deviation of 0.15 (Fig. 6). In Table 4 
we have listed the top five pairings with the highest 
correlation coefficients, which we will discuss in de- 
tail. All of them have values close to or even above 
the 1.96 times standard deviation boundary (marking 
significant values with a probability value of 
p < 0.05). This indicates that coevolution between the 
respective peptide domains is likely. Among these top 
five pairs, the three possible combinations among the 
cytoplasmic domains IC2 and IC3 and the C termi- 
nus have the highest scores (Table 4), about 0.8, a 
value that has been suggested to be a good empirical 
cutoff to indicate with a high probability true positive 
interactions (Pazos and Valencia, 2001). The follow- 
ing two pairs both indicate also a possible coevolu- 
tion of IC1 with loops IC2 and IC3 (Table 4). Taken 
together, the analysis provides evidence for coevolu- 
tion of all cytoplasmic loops with the C terminus, 
showing a particular emphasis on IC2, IC3, and the C 
terminus. Probable coevolution between the cyto- 
plasmic domains of MLO suggests interplay of these 
domains and interaction with a putative partner(s) 
for MLO protein function. Although other scenarios 
are possible, the most likely interpretation is related . 
to a conserved interaction of the cytoplasmic do- 
mains with a common binding partner. An analogous 
situation has been demonstrated experimentally for 
the well-characterized family of GPCRs in binding 
heterotrirneric G-proteins (reviewed by Hamm 1998). 
The relative absence of correlations joining the ex- 
tracellular domains could relate to the heterogeneity 
of presumptive ligands that might bind and activate 
MLO proteins. 

GenBank Accession Numbers 

GenBank accession numbers for newly deposited se- 
quences are as follows: Z95353 (OsMloJ), AF384030 
(OsMio2), AF361933 (TaMlol)r AF361932 



Table 4. Correlation coefficients of the coevolution analysis of 
MLO protein domains 



Rank 


Pair 


Correlation coefficient 


1 - 


IC3/C terminus 


0.85 - 


2 


IC2/IC3 


0.82 


3 


IC2/C terminus 


0.79 


4 


IC1/IC2 


0.78 


5 


IC1/IC3 


0.77 



Note. IC, intracellular loop. 



(TaMlo2\ Z95352 and AF369563-AF369576 (At- 
Mlol-AtMlol5\ AY029312-AY029320 (ZmMIol- 
ZmMlo9\ and AY090638-AY090647 (Hordeum 
species in Table 3). 
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Appendix A 



Appendix A shows a comparison of the amino acid sequences of Mlo proteins SEQ ID 
NO:32 and barley (NCBI General Identifier No. 1877221 , SEQ ID NO: 39). Amino acids 
conserved among both sequences are indicated with an asterisk (*) on the top row; dashes are 
used by the program to maximize alignment of the sequences. The seven membrane-spanning 
helices (underlined), the conserved Cystein residues (A) the putative nuclear localization motif 
(bold letters) and the two casein kinase II motifs (boxed sequences) were all identified and 
discussed by Bueschges et al. ( Cell 1997, 88: 695-705) and Devoto et al. (JBC 1999, 274: 
34993-35004). Sites of mutations as published in Bueschges et al. ( Cell 1997, 88: 695-705) are 
indicated in alignment as follows: (•) mutations, (►) frameshift changes, (A) deletions. 



* * * * ******* * * * *** * **** ** ******** *** ** ***** 

SEQ ID NO: 32 MAEDYEYPPARTLPETPSWAVALVFAVMI I VSVLLEHALHKLGHWFHKRHKNALAEALEK 
Gi:1877221 MSDKKG-VPARELPET PSWAVAVVFAAMVLVSVLM EHGLHKLGHWFQHRHKKALWEALEK 

• • 

************** ****** ***** ** * ** * *** ** * * 

SEQ ID NO: 32 IKAELMLVGFISLLLAVTQDPISG-ICISEKAASIMRPCSLPPGSVK-SKYKDYYCAKKG 
Gi : 1877221 MKAE LMLVGFISLLLI VTQDPI IA KICISEDAADVMWPCKRGTEGRKPSKYVDY-CPE-G 

▲ ▲ ▲ 

** *********** * * * *** * ******* * * * * * ** * ********* * ********** 

SEQ ID NO: 32 KVSLMSTGSLHQLHIFIFVLAVFHVTYSVI IMALSRLKMRTWKKWETETASLEYQFANDP 
Gi:1877221 KVALMSTGSLHQL HVFIFVLAVFHVTYSVITIAL SRLKMRTWKKWETETTSLEYQFANDP 

► • 

******************************************************** *** 

SEQ ID NO: 32 ARFRFTHQTSFVKRHLGLSSTPGIRWVVAFFRQFFRSVTKVDYLTLRAGFINAHLSHNSK 
Gi : 1877221 ARFRFTHQTSFVKRHLGLSSTPGIRWVVAFFRQFFRSVTKVDYLTLRAGFINAHLSQNSK 

AA • 
************************************* * *** **************** 

SEQ ID NO: 32 FDFHKYIKRSMEDDFKVVVGI SLPLWCVAILTLFLDI DGIGTLTWI S FI PLVILLCVGTK 
Gi : 1877221 FDFHKYIKRSMEDDFK VVVGISLPLWGVAILTLF LDINGV GTLIWISFI PLVILLCVG TK 

************************************************************ 

SEQ ID NO: 32 LEMI IMEMALEIQDRASVIKGAPVVEPSNKFFWFHRPDWVLFFIHLTLFQNAFQMAHFVW 
Gi:1877221 LEMI IMEMALEIQDRASVIKGAPVVEPSNKFFWFHRPDW VLFFIHLTLFQNAFQMAHFVW 

********** * ********* ********** ************************ 
SEQ ID NO: 32 TVATPGLKKCFHMHIGLSIMKVVLGLALQFLCSYITFPLYALVTQMGSNMKRSI FDEQTA 
Gi: 187.7221 TVATPGLKKCYHTQIGLSIMKVVV GLALQFLCSYMTFPLYALVT QMGSNMKR pTTTT^ QTS 

▲ 

*********************************** ****** ***************** 

SEQ ID NO: 32 KALTNWRNTAKEKKKVRDTDMLMAQMIGDATPSRGTSPMPSRASSPVHLLHKGMGRSDDP 
Gi : 1877221 KALTNWRN' pn^ KKKVRDTDMLMAQMIGDATPSRGSSPMPSRGSSPVHLLHKGMGRSDDP 

********* ******************* ***** ***** ************* 

SEQ ID NO: 32 QSAPTSPRTMEEARDMYPVVVAHPVHRLNPADRRRSVSSSALDADI PSADFSFSQG 
Gi : 1877221 QSAPTS PRTQQEARDMYPVVVAHPVHRLNPNDRRRS ASS SALEADI PSADFSFSQG 



